ASF JIRA
ASF JIRA
Displaying 1000 issues at 19/Mar/20 20:35.
Project Key Summary Issue Type Status Priority Resolution Assignee Reporter Creator Created Last Viewed Updated Resolved Affects Version/s Fix Version/s Component/s Due Date Votes Watchers Images Original Estimate Remaining Estimate Time Spent Work Ratio Sub-Tasks Linked Issues Environment Description Security Level Progress Σ Progress Σ Time Spent Σ Remaining Estimate Σ Original Estimate Labels Git Notification Mailing List Github Integration Git Repository Name Global Rank Git Repository Type Blog Administrator? Blogs - Admin for blog Blogs - Username Blogs - Email Address Docs Text Git Repository Import Path New-TLP-TLPName Blogs - New Blog Write Access Epic Colour Blogs - Existing Blog Name Enable Automatic Patch Review Attachment count Blog - New Blog PMC Epic Name Blog - New Blog Administrators Epic Status Blog - Write access Epic Link Change Category Bug Category Bugzilla - List of usernames Bugzilla - PMC Name Test and Documentation Plan Bugzilla - Email Notification Address Discovered By Blogs - Existing Blog Access Level Complexity Bugzilla - Project Name Severity Initial Confluence Contributors Space Name Space Description Space Key Sprint Rank (Obsolete) Project Machine Readable Info Review Patch? Flags Source Control Link Authors Development Reviewers Ignite Flags Date of First Response Github Integrations - Other Last public comment date Skill Level Affects version (Component) Backport to Version Fix version (Component) Skill Level Existing GitBox Approval Protected Branch GitHub Options Release Note Hadoop Flags Tags Bugzilla Id Level of effort Target Version/s Bug behavior facts Lucene Fields Github Integration - Triggers Workaround Bugzilla Id INFRA - Subversion Repository Path Testcase included Estimated Complexity Regression Review Date Evidence Of Use On World Wide Web Evidence Of Registration Epic/Theme Flagged External issue ID Priority Reproduced In Tags Since Version Reviewer External issue URL Hadoop Flags Issue & fix info Evidence Of Open Source Adoption Rank Severity Tester
ZooKeeper ZOOKEEPER-1742

"make check" doesn't work on macos

Bug Open Major Unresolved Michael Han Flavio Paiva Junqueira Flavio Paiva Junqueira 21/Aug/13 11:51   05/Feb/20 07:16   3.4.5, 3.5.0 3.7.0, 3.5.8     0 6   There are two problems I have spotted when running "make check" with the C client. First, it complains that the sleep call is not defined in two test files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. Including unistd.h works. The second problem is with linker options. It complains that "--wrap" is not a valid. I'm not sure how to deal with this one yet, since I'm not sure why we are using it. 344801 No Perforce job exists for this issue. 11 345101
3 years, 32 weeks, 5 days ago 0|i1ngkn:
ZooKeeper ZOOKEEPER-1741

bin scripts don't dereference symlinks

Bug Resolved Trivial Duplicate Max Lapan Max Lapan Max Lapan 16/Aug/13 08:46   02/Oct/13 13:02 02/Oct/13 13:02 3.4.5   scripts   0 1   Centos 5.8 Symlinks on bin scripts are not dereferenced correctly ("set -x" added):
{noformat}
[root@tsthdp1 noarch]# which zookeeper-client
/usr/local/bin/zookeeper-client
[root@tsthdp1 noarch]# ls -la /usr/local/bin/zookeeper-client
lrwxrwxrwx 1 root root 40 Авг 16 15:56 /usr/local/bin/zookeeper-client -> /usr/local/hadoop/zookeeper/bin/zkCli.sh
[root@tsthdp1 noarch]# ls -la /usr/local/hadoop/zookeeper/bin
итого 36
drwxr-xr-x 2 root root 4096 Авг 16 16:24 .
drwxr-xr-x 5 root root 4096 Авг 16 15:56 ..
-rwxr-xr-x 1 root root 1909 Авг 16 15:56 zkCleanup.sh
-rwxr-xr-x 1 root root 1536 Авг 16 16:22 zkCli.sh
-rwxr-xr-x 1 root root 2599 Авг 16 15:56 zkEnv.sh
-rwxr-xr-x 1 root root 4559 Авг 16 15:56 zkServer-initialize.sh
-rwxr-xr-x 1 root root 6246 Авг 16 15:56 zkServer.sh
[root@tsthdp1 noarch]# zookeeper-client
+ ZOOBIN=/usr/local/bin/zookeeper-client
++ dirname /usr/local/bin/zookeeper-client
+ ZOOBIN=/usr/local/bin
++ cd /usr/local/bin
++ pwd
+ ZOOBINDIR=/usr/local/bin
+ '[' -e /usr/local/bin/../libexec/zkEnv.sh ']'
+ . /usr/local/bin/zkEnv.sh
/usr/local/bin/zookeeper-client: line 37: /usr/local/bin/zkEnv.sh: no such file or directory
{noformat}
344055 No Perforce job exists for this issue. 1 344357
6 years, 25 weeks, 1 day ago 0|i1nbzz:
ZooKeeper ZOOKEEPER-1740

Zookeeper 3.3.4 loses ephemeral nodes under stress

Bug Resolved Critical Fixed Flavio Paiva Junqueira Neha Narkhede Neha Narkhede 15/Aug/13 14:18   06/Feb/16 23:15 06/Feb/16 23:15 3.3.4   server   3 10   The current behavior of zookeeper for ephemeral nodes is that session expiration and ephemeral node deletion is not an atomic operation.

The side-effect of the above zookeeper behavior in Kafka, for certain corner cases, is that ephemeral nodes can be lost even if the session is not expired. The sequence of events that can lead to lossy ephemeral nodes is as follows -

1. The session expires on the client, it assumes the ephemeral nodes are deleted, so it establishes a new session with zookeeper and tries to re-create the ephemeral nodes.
2. However, when it tries to re-create the ephemeral node,zookeeper throws back a NodeExists error code. Now this is legitimate during a session disconnect event (since zkclient automatically retries the
operation and raises a NodeExists error). Also by design, Kafka server doesn't have multiple zookeeper clients create the same ephemeral node, so Kafka server assumes the NodeExists is normal.
3. However, after a few seconds zookeeper deletes that ephemeral node. So from the client's perspective, even though the client has a new valid session, its ephemeral node is gone.

This behavior is triggered due to very long fsync operations on the zookeeper leader. When the leader wakes up from such a long fsync operation, it has several sessions to expire. And the time between the session expiration and the ephemeral node deletion is magnified. Between these 2 operations, a zookeeper client can issue a ephemeral node creation operation, that could've appeared to have succeeded, but the leader later deletes the ephemeral node leading to permanent ephemeral node loss from the client's perspective.

Thread from zookeeper mailing list: http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results

The way to reproduce this behavior is as follows -

1. Bring up a zookeeper 3.3.4 cluster and create several sessions with ephemeral ndoes on it using zkclient. Make sure the session expiration callback is implemented and it re-registers the ephemeral node.
2. Run the following script on the zookeeper leader -
while true
do
kill -STOP $1
sleep 8
kill -CONT $1
sleep 60
done
3. Run another script to check for existence of ephemeral nodes.

This script shows that zookeeper loses the ephemeral nodes and the clients still have a valid session.

343898 No Perforce job exists for this issue. 0 344200
4 years, 6 weeks, 4 days ago 0|i1nb13:
ZooKeeper ZOOKEEPER-1739

thread safe bug in FastLeaderElection: instance of WorkerSender is not safe published, WorkerSender thread may see that WorkerSender.manager is the default value null

Bug Open Minor Unresolved qingjie qiao qingjie qiao qingjie qiao 09/Aug/13 00:29   10/Aug/13 23:01   3.4.5   leaderElection   0 4   I am reading the trunk source code recently and find a thread-safe problem, but i'm not quite sure.

in FastLeaderElection:

{code}
class WorkerSender implements Runnable {
volatile boolean stop;
QuorumCnxManager manager;

WorkerSender(QuorumCnxManager manager){
this.stop = false;
this.manager = manager;
}

public void run() {
...
}
}

...

Messenger(QuorumCnxManager manager) {

this.ws = new WorkerSender(manager);

Thread t = new Thread(this.ws,
"WorkerSender[myid=" + self.getId() + "]");
t.setDaemon(true);
t.start();

this.wr = new WorkerReceiver(manager);

t = new Thread(this.wr,
"WorkerReceiver[myid=" + self.getId() + "]");
t.setDaemon(true);
t.start();
}
...
{code}

The instance of WorkerSender is constructed in main thread, and its field manager is assigned , and it is used in another thread. The later thread may see that WorkerSender.manager is the default value null. The solution may be:
(1) change
{code}
WorkerSender(QuorumCnxManager manager){
this.stop = false;
this.manager = manager;
}
{code}

to

{code}
WorkerSender(QuorumCnxManager manager){
this.manager = manager;
this.stop = false;
}
{code}

or(2)
change

{code}
QuorumCnxManager manager;
{code}

to

{code}
final QuorumCnxManager manager;
{code}
342783 No Perforce job exists for this issue. 1 343087
6 years, 32 weeks, 4 days ago 0|i1n45r:
ZooKeeper ZOOKEEPER-1738

Xid out of order from a 3.4.5 client to a 3.3.5 cluster

Bug Resolved Major Invalid Unassigned Vincent Bernat Vincent Bernat 07/Aug/13 10:43   24/Oct/13 01:41 24/Oct/13 01:41 3.3.5       0 1   Server: zookeeper 3.3.5+dfsg1-1ubuntu1
Client: zookeeper 3.4.5 from Cloudera 4.3.0
This happens in the context of HBase master nodes getting connections from HBase region server. Once an HBase region server joins the cluster, I get the following error:

{code}
2013-08-07 13:35:18,676 WARN org.apache.zookeeper.ClientCnxn: Session 0xd4058c4d7940003 for server zk-01.dev.dailymotion.com/10.194.60.13:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Xid out of order. Got Xid 56 with err -101 expected Xid 55 for a packet with details: clientPath:null serverPath:null finished:false header:: 55,14 replyHeader:: 0,0,-4 request:: org.apache.zookeeper.MultiTransactionRecord@360193e5 response:: org.apache.zookeeper.MultiResponse@0
at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:795)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2013-08-07 13:35:18,676 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
2013-08-07 13:35:18,676 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper multi failed after 3 retries
2013-08-07 13:35:18,677 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Unable to ensure that the table -ROOT- will be enabled because of a ZooKeeper issue
2013-08-07 13:35:18,677 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2013-08-07 13:35:18,677 FATAL org.apache.hadoop.hbase.master.HMaster: Unable to ensure that the table -ROOT- will be enabled because of a ZooKeeper issue
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1440)
at org.apache.hadoop.hbase.zookeeper.ZKTable.setTableState(ZKTable.java:245)
at org.apache.hadoop.hbase.zookeeper.ZKTable.setEnabledTable(ZKTable.java:325)
at org.apache.hadoop.hbase.master.AssignmentManager.setEnabledTable(AssignmentManager.java:3576)
at org.apache.hadoop.hbase.master.AssignmentManager.setEnabledTable(AssignmentManager.java:2340)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1674)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399)
at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394)
at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105)
at org.apache.hadoop.hbase.master.AssignmentManager.addToRITandCallClose(AssignmentManager.java:675)
at org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:586)
at org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:525)
at org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransitionAndBlockUntilAssigned(AssignmentManager.java:489)
at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:679)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:583)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:395)
at java.lang.Thread.run(Thread.java:722)
2013-08-07 13:35:18,678 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2013-08-07 13:35:18,678 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server stopped; skipping assign of -ROOT-,,0.70236052 state=OFFLINE, ts=1375881792131, server=null
2013-08-07 13:35:18,678 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Waiting on 70236052/-ROOT-
2013-08-07 13:35:18,678 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: masternode-01.dev.dailymotion.com,60000,1375880747185.timeoutMonitor exiting
2013-08-07 13:35:18,679 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=masternode-01.dev.dailymotion.com,60000,1375880747185, region=70236052/-ROOT-, which is more than 15 seconds late
2013-08-07 13:35:18,776 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server
2013-08-07 13:35:18,776 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
2013-08-07 13:35:18,776 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 3 retries
2013-08-07 13:35:18,777 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1...
{code}
342399 No Perforce job exists for this issue. 0 342704
6 years, 22 weeks ago 0|i1n1t3:
ZooKeeper ZOOKEEPER-1737

zk scripts no longer work when symlinked

Bug Patch Available Major Unresolved Chris Seawood Chris Seawood Chris Seawood 05/Aug/13 15:12   30/Sep/13 20:14   3.4.5   scripts   0 1   RHEL6.4 At some point since 3.3, the shell scripts were updated to move away from using readlink to using BASH_SOURCE. The problem is that BASH_SOURCE doesn't resolve symlinks so when /usr/bin/zookeeper-cli is symlinked to /usr/lib/zookeeper/bin/zkCli.sh it fails every single time.

341952 No Perforce job exists for this issue. 1 342258
6 years, 25 weeks, 3 days ago 0|i1mz27:
ZooKeeper ZOOKEEPER-1736

Zookeeper SASL authentication allows anonymus users to log in

Bug Resolved Major Not A Problem Unassigned AntonioS AntonioS 26/Jul/13 04:40   19/Mar/19 08:58 10/Oct/13 13:43     server   0 7   Development Hello.
I have configured Zookeeper to provide SASL authentication, using ordinary username and password stored in the JAAS.conf as a DigestLoginModule
I have created a simple jaas.conf file:

Server {
org.apache.zookeeper.server.auth.DigestLoginModule required
user_admin="admin";
};
Client {
org.apache.zookeeper.server.auth.DigestLoginModule required
username="admin"
password="admin";
};

I have the zoo.cfg correctly configured for security, adding the following:
requireClientAuthScheme=sasl
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
jaasLoginRenew=3600000
zookeeper.allowSaslFailedClients=false

And I also have the java.env file:
export JVMFLAGS="-Djava.security.auth.login.config=/etc/zookeeper/conf/jaas.conf -Dzookeeper.allowSaslFailedClients=false"


Everything looks good. If I put the right username and password I authenticate, otherwise not and I get an exception.
The problem is when I don’t put any username and password at all, zookeeper allows me to go through.
I tried different things but nothing stops anonymous users to log in.
I was looking at the source code, in particular the ZookeeperServer.java, this method:

public void processPacket(ServerCnxn cnxn, ByteBuffer incomingBuffer) throws IOException {

The section below:

} else {
if (h.getType() == OpCode.sasl) {
Record rsp = processSasl(incomingBuffer,cnxn);
ReplyHeader rh = new ReplyHeader(h.getXid(), 0, KeeperException.Code.OK.intValue());
cnxn.sendResponse(rh,rsp, "response"); // not sure about 3rd arg..what is it?
}
else {
Request si = new Request(cnxn, cnxn.getSessionId(), h.getXid(),
h.getType(), incomingBuffer, cnxn.getAuthInfo());
si.setOwner(ServerCnxn.me);
submitRequest(si);
}
}

The else flow appears to just forward any anonymous request to the handler, without attempting any authentication.

Is this a bug? Is there any way to stop anonymous users connecting to Zookeeper?
Thanks

Antonio


ssl-tls 340184 No Perforce job exists for this issue. 0 340502
1 year, 13 weeks, 1 day ago 0|i1mo8n:
ZooKeeper ZOOKEEPER-1735

ZOOKEEPER-1722 Make ZooKeeper easier to test - support simulating a connection loss

Sub-task Open Major Unresolved Unassigned Jordan Zimmerman Jordan Zimmerman 22/Jul/13 17:51   13/Aug/13 03:28       java client   0 1   As part of making ZooKeeper clients more test friendly, it would be useful to easily simulate a connection loss event 339399 No Perforce job exists for this issue. 0 339719
6 years, 35 weeks, 3 days ago 0|i1mjev:
ZooKeeper ZOOKEEPER-1734

Zookeeper fails to connect if one zookeeper host is down on EC2 when using elastic IP (UnknownHostException)

Bug Open Major Unresolved Unassigned Andy Grove Andy Grove 22/Jul/13 16:28   15/Jul/14 13:22   3.4.5   java client   1 7   Amazon EC2. Linux. We use Amazon Elastic IP for zookeeper hosts so that the zookeeper hosts have the same IP address after a restart.

The issue is, if one host is down then we cannot connect to the other hosts.

Here is an example connect string:

"ec2-1-2-3-4.compute-1.amazonaws.com, ec2-4-3-2-1.compute-1.amazonaws.com, ec2-5-5-5-5.compute-1.amazonaws.com"

If all three hosts are up, we can connect. If one host is down, then we cannot create a Zookeeper instance due to an UnknownHost exception, even though the other servers in the connect string are valid.

java.net.UnknownHostException: ec2-5-5-5-5.compute-1.amazonaws.com
at java.net.InetAddress.getAllByName0(InetAddress.java:1243)
at java.net.InetAddress.getAllByName(InetAddress.java:1155)
at java.net.InetAddress.getAllByName(InetAddress.java:1091)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
339383 No Perforce job exists for this issue. 0 339703
5 years, 36 weeks, 2 days ago 0|i1mjbb:
ZooKeeper ZOOKEEPER-1733

FLETest#testLE is flaky on windows boxes

Bug Closed Major Fixed Jeffrey Zhong Jeffrey Zhong Jeffrey Zhong 19/Jul/13 20:31   13/Mar/14 14:17 18/Dec/13 10:48 3.4.5 3.4.6, 3.5.0     0 6   FLETest#testLE fail intermittently on windows boxes. The reason is that in LEThread#run() we have:
{code}
if(leader == i){
synchronized(finalObj){
successCount++;
if(successCount > (count/2)) finalObj.notify();
}

break;
}
{code}

Basically once we have a confirmed leader, the leader thread dies due to the "break" of while loop.

While in the verification step, we check if the leader thread alive or not as following:
{code}
if(threads.get((int) leader).isAlive()){
Assert.fail("Leader hasn't joined: " + leader);
}
{code}
On windows boxes, the above verification step fails frequently because leader thread most likely already exits.

Do we know why we have the leader alive verification step only lead thread can bump up successCount >= count/2?
339050 No Perforce job exists for this issue. 3 339370
6 years, 2 weeks ago
Reviewed
0|i1mh9b:
ZooKeeper ZOOKEEPER-1732

ZooKeeper server unable to join established ensemble

Bug Closed Blocker Fixed Germán Blanco Germán Blanco Germán Blanco 19/Jul/13 12:14   13/Mar/14 14:17 29/Oct/13 23:22 3.4.5 3.4.6, 3.5.0 leaderElection   0 12   Windows 7, Java 1.7 I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time.
I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two.
338972 No Perforce job exists for this issue. 13 339292
6 years, 2 weeks ago
Reviewed
0|i1mgrz:
ZooKeeper ZOOKEEPER-1731

Unsynchronized access to ServerCnxnFactory.connectionBeans results in deadlock

Bug Closed Critical Fixed Dave Latham Dave Latham Dave Latham 16/Jul/13 13:59   02/Mar/16 20:33 02/Aug/13 13:45   3.4.6     0 8   We had a cluster of 3 peers (running 3.4.3) fail after we took down 1 peer briefly for maintenance. A second peer became unresponsive and the leader lost quorum. Thread dumps on the second peer showed two threads consistently stuck in these states:

{noformat}
"QuorumPeer[myid=0]/0.0.0.0:2181" prio=10 tid=0x00002aaab8d20800 nid=0x598a runnable [0x000000004335d000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.put(HashMap.java:405)
at org.apache.zookeeper.server.ServerCnxnFactory.registerConnection(ServerCnxnFactory.java:131)
at org.apache.zookeeper.server.ZooKeeperServer.finishSessionInit(ZooKeeperServer.java:572)
at org.apache.zookeeper.server.quorum.Learner.revalidate(Learner.java:444)
at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:133)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:86)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)


"NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181" daemon prio=10 tid=0x00002aaab84b0800 nid=0x5986 runnable [0x0000000040878000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.removeEntryForKey(HashMap.java:614)
at java.util.HashMap.remove(HashMap.java:581)
at org.apache.zookeeper.server.ServerCnxnFactory.unregisterConnection(ServerCnxnFactory.java:120)
at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:971)
- locked <0x000000078d8a51f0> (a java.util.HashSet)
at org.apache.zookeeper.server.NIOServerCnxnFactory.closeSessionWithoutWakeup(NIOServerCnxnFactory.java:307)
at org.apache.zookeeper.server.NIOServerCnxnFactory.closeSession(NIOServerCnxnFactory.java:294)
- locked <0x000000078d82c750> (a org.apache.zookeeper.server.NIOServerCnxnFactory)
at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:834)
at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:410)
at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:200)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
{noformat}

It shows both threads concurrently modifying ServerCnxnFactory.connectionBeans which is a java.util.HashMap.

This cluster was serving thousands of clients, which seems to make this condition more likely as it appears to occur when one client connects and another disconnects at about the same time.
338260 No Perforce job exists for this issue. 1 338580
6 years, 2 weeks ago 0|i1mcdz:
ZooKeeper ZOOKEEPER-1730

ZOOKEEPER-1722 Make ZooKeeper easier to test - support simulating a session expiration

Sub-task Resolved Major Fixed Jordan Zimmerman Jordan Zimmerman Jordan Zimmerman 15/Jul/13 23:37   01/Apr/14 07:10 31/Mar/14 22:03   3.5.0 java client   0 4   As part of making ZooKeeper clients more test friendly, it would be useful to easily simulate a session loss event 338117 No Perforce job exists for this issue. 2 338438
5 years, 51 weeks, 2 days ago 0|i1mbif:
ZooKeeper ZOOKEEPER-1729

Add l4w command "snap" to trigger log rotation and snapshotting

Improvement Open Minor Unresolved Thawan Kooburat Thawan Kooburat Thawan Kooburat 15/Jul/13 23:31   17/Feb/17 10:08       server   0 2   "snap" command can be used to trigger log rotate and snapshotting on each server.

One use case for this command is to make server restart faster by issuing snap command before restarting the server. This help when txnlog is large (due to txn size or number of txn)

snap is a blocking command, it will return when snapshot is written to disk. So it is safe to call this prior to restarting the server.
338116 No Perforce job exists for this issue. 0 338437
6 years, 24 weeks, 2 days ago 0|i1mbi7:
ZooKeeper ZOOKEEPER-1728

Better error message when reconfig invoked in standalone mode

Improvement Resolved Minor Fixed Alexander Shraer Alexander Shraer Alexander Shraer 13/Jul/13 18:02   01/Apr/14 07:10 31/Mar/14 19:47 3.5.0 3.5.0     0 4   For now reconfig is not supported in standalone mode. But when invoked it should return something better than the current ClassCast exception.

The patch throws a KeeperException.UnimplementedException in this case (most errors are reported through exceptions).
337829 No Perforce job exists for this issue. 2 338151
5 years, 51 weeks, 2 days ago 0|i1m9qn:
ZooKeeper ZOOKEEPER-1727

Doc request: The right way to expand a cluster

Wish Resolved Minor Duplicate Alexander Shraer Justin SB Justin SB 12/Jul/13 21:31   13/Jul/13 15:58 13/Jul/13 15:43 3.5.0 3.5.0     0 4   When expanding a cluster from 2->3, if ZK server #3 isn't up yet, then it seems that the reconfig request times out with a connection-loss error. The configuration is updated though. So we could wait, reconnect, and then refetch the config to make sure we did join the quorum, though that seems a little bit hacky!

What is correct way to do this (and cluster growth in general)? Should we bring up new ZK servers before issuing the reconfig command? What is the right way to bring up new ZK servers (connect as a client, request the config, save the config to the zk.conf.dynamic file, add our new server line to the new zk.conf.dynamic file, start the new server, call reconfig as a client to the existing cluster)?

Is this documented anywhere? (Just the steps to do it "the right way" would be great, no need for actual code) :-)
337777 No Perforce job exists for this issue. 0 338099
6 years, 36 weeks, 5 days ago 0|i1m9f3:
ZooKeeper ZOOKEEPER-1726

No way to dynamically go from 1 ZK server -> 2 ZK servers?

Bug Resolved Major Duplicate Unassigned Justin SB Justin SB 12/Jul/13 20:45   12/Jul/13 20:53 12/Jul/13 20:53 3.5.0       0 2   The dynamic reconfiguration feature is great. But it doesn't seem to be possible to go from 1 server to 2 servers (1 server + 1 observer). When there's only one server, ZK automatically starts in single server mode; when in single server mode trying to add a server causes a class cast exception because the server is a ZooKeeperServer, not a LeaderZooKeeperServer. 337774 No Perforce job exists for this issue. 0 338096
6 years, 36 weeks, 5 days ago 0|i1m9ef:
ZooKeeper ZOOKEEPER-1725

Zookeeper Dynamic Conf writes out hostnames when IPs are supplied

Bug Resolved Minor Fixed Michi Mutsuzaki Justin SB Justin SB 12/Jul/13 20:43   01/Apr/14 07:10 31/Mar/14 19:57 3.5.0 3.5.0     0 5   When writing the dynamic configuration out, Zookeeper writes out hostnames, even if an IP address is supplied. These may not correctly round-trip (e.g. 127.0.0.1 might be written as localhost which may then resolve to 127.0.0.1 and another IP address).

This isn't actually causing problems for me right now, but seems very likely to cause hard-to-track-down problems in future.
337773 No Perforce job exists for this issue. 3 338095
5 years, 51 weeks, 2 days ago 0|i1m9e7:
ZooKeeper ZOOKEEPER-1724

Support Kerberos authentication for non-SUN JDK

Improvement Open Major Unresolved Bing Li Bing Li Bing Li 08/Jul/13 21:23   05/Feb/20 07:16   3.4.5, 3.4.6, 3.5.0 3.7.0, 3.5.8     1 3   Current class Login only support running with SUN JDK when Kerberos is enabled. In order to support alternative JDKs like IBM JDK which has different options supported by Krb5LoginModule, class Login should be changed. 336992 No Perforce job exists for this issue. 1 337315
5 years, 39 weeks ago Support Kerberos authentication for non-SUN JDK 0|i1m4lb:
ZooKeeper ZOOKEEPER-1723

unique ensemble identifier

Bug Open Major Unresolved Unassigned Mohammad Shamma Mohammad Shamma 08/Jul/13 16:46   08/Jul/13 16:46       server   0 1   Zookeeper ensembles need an identifier that would prevent misconfigured zookeeper server from clobbering the configuration of a zookeeper ensemble.

Use case:

- A zookeeper based distributed system that grows its zookeeper ensemble incrementally.
- The system is reset, where the new zookeeper ensemble is a subset of the old zookeeper ensemble (the history of the new ensemble have been reset too).
- The old zookeeper servers will attempt to communicate with the new servers (assuming the network end-points remain the same).
- The new zookeeper servers will notice that the old zookeeper servers have a higher configuration version and will attempt to reconfigure based on the old ensemble configuration info.

Note that this can be solved if the reset process would stop every zookeeper server in the old deployment and delete its history. However, some of these servers might be down at the time of reset, therefore this solution is not reliable.

I am sure this is not the most generic description of the problem of not having ensemble identifiers, but it presents a use case for introducing them to prevent servers from cross-talking across different ensembles. Otherwise they will automatically join in to form a single ensemble.
336902 No Perforce job exists for this issue. 0 337225
6 years, 37 weeks, 3 days ago 0|i1m41b:
ZooKeeper ZOOKEEPER-1722

Make ZooKeeper clients more test friendly

Improvement Open Major Unresolved Unassigned Thawan Kooburat Thawan Kooburat 08/Jul/13 13:57   15/Jul/13 23:45       c client, java client   0 4   ZOOKEEPER-1730, ZOOKEEPER-1735 We should be able to expose a few more API calls that allow user write unit tests that cover various failure scenarios (similar to the TestableZookeer in zookeeper test). This should also minimize the effort on setting the test framework for application developer

Here is some example calls that we should provide.
1. zookeeper_close() that don't actually send close request to the server: This can be used to simulate a client crash without actually crashing the test program.
2. Allow client to trigger CONNECTION_LOSS or SESSSION_EXPIRE event: This will allow the user to test their watchers and callback (and possible race condition)
336864 No Perforce job exists for this issue. 0 337187
6 years, 36 weeks, 2 days ago 0|i1m3sv:
ZooKeeper ZOOKEEPER-1721

Ability to run without writing to disk

New Feature Open Major Unresolved Unassigned Radim Kolar Radim Kolar 06/Jul/13 11:51   09/Jul/13 11:46   3.4.5   server   0 2   I use zookeeper for cluster synchronization. We have no need for keeping persistent state across zookeeper restarts. For performance enhancement would be good to have possibility to run without writing snapshots and logs. 336711 No Perforce job exists for this issue. 0 337034
6 years, 37 weeks, 2 days ago 0|i1m2uv:
ZooKeeper ZOOKEEPER-1720

Race in zookeeper_close() leads to hang

Bug Open Major Unresolved Unassigned Kevin Jamieson Kevin Jamieson 05/Jul/13 23:34   17/Oct/17 08:14   3.5.0   c client   1 4   Ubuntu 12.04.1 Using ZK 3.5.4, zookeeper_close() occasionally hangs with a backtrace of the form:

{noformat}
#0 0x00002b255fab489c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00002b255fab26b0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00002b2560568ced in unlock_completion_list (l=0x13f5430) at src/mt_adaptor.c:69
#3 0x00002b256055b9ec in free_completions (zh=0x13f5270, callCompletion=1, reason=-116) at src/zookeeper.c:1521
#4 0x00002b256055d3bc in zookeeper_close (zh=0x13f5270) at src/zookeeper.c:2954
{noformat}

At which point the zhandle_t struct appears to have already been freed, as it contains garbage:

{noformat}
(gdb) p zh->sent_requests.cond
$19 = {
__data = {
__lock = 2,
__futex = 0,
__total_seq = 18446744073709551615,
__wakeup_seq = 0,
__woken_seq = 0,
__mutex = 0x0,
__nwaiters = 0,
__broadcast_seq = 0
},
__size = "\002\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377", '\000' <repeats 31 times>,
__align = 2
}
{noformat}

There appears to be a race condition in the following code:

{noformat}
int api_epilog(zhandle_t *zh,int rc)
{
if(inc_ref_counter(zh,-1)==0 && zh->close_requested!=0)
zookeeper_close(zh);
return rc;
}

int zookeeper_close(zhandle_t *zh)
{
int rc=ZOK;
if (zh==0)
return ZBADARGUMENTS;

zh->close_requested=1;
if (inc_ref_counter(zh,1)>1) {
{noformat}

As api_epilog() may free zh in between zookeeper_close() setting zh->close_requested=1 and incrementing the reference count.

The following patch should fix the problem:

{noformat}
diff --git a/src/c/src/zookeeper.c b/src/c/src/zookeeper.c
index 6943243..61a263a 100644
--- a/src/c/src/zookeeper.c
+++ b/src/c/src/zookeeper.c
@@ -1051,6 +1051,7 @@ zhandle_t *zookeeper_init(const char *host, watcher_fn watcher,
goto abort;
}

+ api_prolog(zh);
return zh;
abort:
errnosave=errno;
@@ -2889,7 +2890,7 @@ int zookeeper_close(zhandle_t *zh)
return ZBADARGUMENTS;

zh->close_requested=1;
- if (inc_ref_counter(zh,1)>1) {
+ if (inc_ref_counter(zh,0)>1) {
/* We have incremented the ref counter to prevent the
* completions from calling zookeeper_close before we have
* completed the adaptor_finish call below. */
{noformat}
336664 No Perforce job exists for this issue. 0 336987
2 years, 22 weeks, 2 days ago 0|i1m2kf:
ZooKeeper ZOOKEEPER-1719

zkCli.sh, zkServer.sh and zkEnv.sh regression caused by ZOOKEEPER-1663

Bug Closed Major Fixed Marshall McMullen Marshall McMullen Marshall McMullen 25/Jun/13 14:00   25/Jul/14 07:25 28/Jun/13 12:28 3.4.5, 3.5.0 3.4.6, 3.5.0     0 5   Linux (Ubuntu 12.04) with dash shell This fix from ZOOKEEPER-1663 is incorrect. It assumes the shell is bash since it uses bash array construction, e.g.:

{code}
96 LIBPATH=("${ZOOKEEPER_PREFIX}"/share/zookeeper/*.jar)
{code}

This does NOT work if /bin/sh points to /bin/dash as it does on Ubuntu.

It fails as so:

{quote}
zkEnv.sh: 96: zkEnv.sh: Syntax error: "(" unexpected (expecting "fi")
{quote}

If I change the shebang at the top to use "/bin/bash" instead of "/bin/sh" it works as expected. I don't know the full details of why using a bash array was chosen as the solution but I don't think it is the right way to deal with spaces in these paths...
335045 No Perforce job exists for this issue. 1 335369
5 years, 34 weeks, 6 days ago 0|i1lslj:
ZooKeeper ZOOKEEPER-1718

Support JLine 2

Test Resolved Critical Fixed Manikumar Christopher Tubbs Christopher Tubbs 19/Jun/13 13:29   18/Nov/14 19:56 01/Oct/13 17:19   3.5.0     1 8   not fixed 334033 No Perforce job exists for this issue. 2 334359
6 years, 25 weeks, 1 day ago JLine upgraded to version 2.11
Reviewed
0|i1lmef:
ZooKeeper ZOOKEEPER-1717

Flex code works in debug mode , not in run mode

Bug Open Major Unresolved Unassigned hareesh hareesh 14/Jun/13 11:22   01/Sep/13 09:27   4.0.0 4.0.0     0 2 43200 43200 0% In my flex application , when i debug the code it's giving the correct result. but , if i run in run mode it's not giving correct result. I tried to know what's happening .but , i didn't get anything. Could you please give me some suggestions ?
0% 0% 43200 43200 333224 No Perforce job exists for this issue. 0 333552
6 years, 29 weeks, 4 days ago
Incompatible change, Reviewed
0|i1lhf3:
ZooKeeper ZOOKEEPER-1716

jute/Utils.fromCSVBuffer cannot parse data returnd by toCSVBuffer

Bug Patch Available Major Unresolved Charlie Helin Robert Joseph Evans Robert Joseph Evans 11/Jun/13 17:48   14/Oct/15 17:01   3.5.0   jute   0 2   I was trying to use org.apache.zookeeper.server.LogFormatter to analyze the access pattern of a particular application. As part of this I wanted to get the size of the data that was being written into ZK.

I ran into an issue where in some cases the hex data had an odd length. I looked into it and found that the buffer is being written out using Integer.toHexString(barr[idx])

Looking at the javadoce for toHexString it indicates that it does not pad the bits at all, and will output the twos compliment of the number if it is negative. I then looked at how the data was being parsed and it assumed that every byte consisted of exactly two characters, which is not true.
{code}
Utils.toCSVBuffer(new byte[] {0xff}) returns "#ffffffff"
Utils.toCSVBuffer(new byte[] {0x01}) returns "#1"

If I combine those
Utils.fromCSVBuffer(Utils.toCSVBuffer(new byte[] {0xff, 0x01, 0xff})) will return {0xff, 0xff, 0xff, 0xff, 0x1f, 0xff, 0xff, 0xff}
{code}

I think what we want is something like
{code}
static final char[] NIBBLE_TO_HEX = {
'0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
};

static String toCSVBuffer(byte barr[]) {
if (barr == null || barr.length == 0) {
return "";
}
StringBuilder sb = new StringBuilder(barr.length + 1);
sb.append('#');
for(int idx = 0; idx < barr.length; idx++) {
byte b = barr[idx];
sb.append(NIBBLE_TO_HEX[b&0x0f]);
sb.append(NIBBLE_TO_HEX[(b&0xf0)>>4]);
}
return sb.toString();
}
{code}
332607 No Perforce job exists for this issue. 1 332936
4 years, 23 weeks, 1 day ago 0|i1ldm7:
ZooKeeper ZOOKEEPER-1715

Upgrade netty version

Improvement Closed Major Fixed Sean Bridges Sean Bridges Sean Bridges 08/Jun/13 01:02   04/May/16 18:00 14/Dec/13 03:38 3.4.5 3.4.6, 3.5.0     2 5   zookeeper 3.4.5 uses netty 3.2.2, which was released in August 2010. The latest version of netty is 3.6.6 released May 2013. Zookeeper should consider upgrading. Upgrade netty version
332170 No Perforce job exists for this issue. 4 332499
6 years, 2 weeks ago 0|i1laxj:
ZooKeeper ZOOKEEPER-1714

perl client segfaults if ZOO_READ_ACL_UNSAFE constant is used

Bug Closed Minor Fixed Botond Hejj Botond Hejj Botond Hejj 06/Jun/13 09:06   13/Mar/14 14:17 21/Jun/13 16:01 3.4.5 3.4.6, 3.5.0 contrib-bindings   1 7   if ZOO_READ_ACL_UNSAFE or ZOO_CREATOR_ALL_ACL constant is used than the client core dumps with segmentation fault. 331648 No Perforce job exists for this issue. 2 331979
6 years, 2 weeks ago 0|i1l7qn:
ZooKeeper ZOOKEEPER-1713

wrong time calculation in zkfuse.cc

Bug Closed Trivial Fixed Germán Blanco Germán Blanco Germán Blanco 06/Jun/13 05:51   13/Mar/14 14:16 02/Sep/13 16:23 3.4.5 3.4.6, 3.5.0     0 5   Linux A colleague of mine has spotted this error in time calculation in the code in zkfuse.cc lines 81 to 85:
inline
uint64_t nanosecsToMillisecs(uint64_t nanosecs)
{
return nanosecs * 1000000;
}
I am not sure how this method is used, but for sure it will make something wrong happen if it is.
331624 No Perforce job exists for this issue. 1 331955
6 years, 2 weeks ago 0|i1l7lb:
ZooKeeper ZOOKEEPER-1712

transient test failure in TestReconfig.cc

Bug Resolved Major Duplicate Marshall McMullen Camille Fournier Camille Fournier 04/Jun/13 13:00   16/Apr/16 09:04 04/Jun/13 13:24         0 3   zktest-mt From the latest build logs:
[exec] Zookeeper_watchers::testChildWatcher2 : elapsed 54 : OK
[exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestReconfig.cc:183: Assertion: equality assertion failed [Expected: 1, Actual : 0]
[exec] Failures !!!
[exec] Run: 67 Failure total: 1 Failures: 1 Errors: 0
[exec] FAIL: zktest-mt
[exe
331260 No Perforce job exists for this issue. 0 331593
6 years, 42 weeks, 2 days ago 0|i1l5cv:
ZooKeeper ZOOKEEPER-1711

ZooKeeper server binds to all ip addresses for leader election and broadcast

Bug Closed Minor Duplicate Unassigned Germán Blanco Germán Blanco 31/May/13 07:16   13/Mar/14 14:17 29/Aug/13 09:22 3.4.5 3.4.6 server   1 3 259200 259200 0% Any Unlike current ZooKeeper version in trunk intended for release as 3.5.0, the current ZooKeeper server version 3.4.5 binds to all ip addresses on the specified port for election. It only makes sense to bind to the ip address indicated in the configuration file, which is where the other servers will connect. Listening to other ip addresses could have bad security implications. 0% 0% 259200 259200 330638 No Perforce job exists for this issue. 0 330972
6 years, 2 weeks ago 0|i1l1j3:
ZooKeeper ZOOKEEPER-1710

Leader should not use txnlog for synchronization if txnlog is corrupted or missing

Improvement Open Minor Unresolved Unassigned Thawan Kooburat Thawan Kooburat 17/May/13 21:45   31/May/13 18:52   3.5.0   server   0 3   It is possible that a human error caused some txnlog files to be remove from the log dir.

The leader should not use txnlog to synchronize with the learner if it found that there is a missing log or the file is corrupted. Since this can cause data inconsistency.
328636 No Perforce job exists for this issue. 0 328979
6 years, 44 weeks, 5 days ago 0|i1kp93:
ZooKeeper ZOOKEEPER-1709

Limit the size of txnlog file

Improvement Open Minor Unresolved Thawan Kooburat Thawan Kooburat Thawan Kooburat 17/May/13 21:38   17/May/13 21:40   3.5.0   server   0 2   The server only create a new log file after ~100k txn. The size of txnlog file can be quite large (> 1GB) if request size is big.

This will cause the server not to use txnlog to sync with the learner.

So we added a parameter so that the server will create a new txnlog file whenever the size exceeded the limit.
328635 No Perforce job exists for this issue. 0 328978
6 years, 44 weeks, 5 days ago 0|i1kp8v:
ZooKeeper ZOOKEEPER-1708

Wrong version of java in control file for deb packages

Bug Resolved Minor Won't Fix Johan Hillertz Johan Hillertz Johan Hillertz 17/May/13 11:01   03/Mar/16 11:23 03/Mar/16 11:23 3.4.5       0 3   After building the deb package it is not installable because of missing dependencies in the control file.

Path:
src/packages/deb/zookeeper.control/control

If I remember correctly the package 'sun-java6-jre' is no longer provided by Ubuntu.

If it is possible to run zookeeper in openjdk the correct string in the control file should be:

"Depends: openjdk-6-jre"
Or
"Depends: openjdk-7-jre"

328544 No Perforce job exists for this issue. 1 328888
4 years, 3 weeks ago 0|i1koov:
ZooKeeper ZOOKEEPER-1707

Incorrect documentation of build dependencies for deb and rpm packages.

Bug Resolved Minor Won't Fix Chris Nauroth Johan Hillertz Johan Hillertz 17/May/13 10:16   03/Mar/16 11:23 03/Mar/16 11:23 3.4.5   documentation   0 3   Since I faild to build a deb package from the instructions. I found that the documentation in 'README_packaging.txt' for building Ubuntu packages can be improved. I have attached a suggested patch.

Tested on Ubuntu 12.04 LTS
documentation 328540 No Perforce job exists for this issue. 2 328884
4 years, 3 weeks ago 0|i1konz:
ZooKeeper ZOOKEEPER-1706

Typo in Double Barriers example

Bug Closed Minor Fixed Jingguo Yao Jingguo Yao Jingguo Yao 13/May/13 02:05   13/Mar/14 14:17 13/May/13 03:34 3.4.5 3.4.6, 3.5.0 documentation 14/May/13 0 4   For the Double Barriers example in the "ZooKeeper Recipes and Solutions" page, the P should be L in line 4 of the Leave pseudo code. 327608 No Perforce job exists for this issue. 1 327952
6 years, 2 weeks ago 0|i1kiwv:
ZooKeeper ZOOKEEPER-1705

Certain implementations of C's rand() function coupled with the shuffle in libzookeeper_mt's getaddrs() produce a biased distribution of connections.

Bug Open Minor Unresolved Unassigned Stephen Tyree Stephen Tyree 10/May/13 14:56   10/May/13 14:56       c client   0 1   Using libzookeeper_mt on an unsupported platform (OpenVMS) with a 5 server connection string, the fourth server in the connection string gets selected approximately only 6% of the time. This appears to be due to some strange properties of the LCG used in OpenVMS's C rand() function. Linux does not exhibit this behavior, but I can't speak for Windows, BSD, etc.

It would be prudent, if libzookeeper_mt's behavior is intended to be the same on every platform it operates on (not that OpenVMS is one of those platforms), to use a PRNG of its own choosing. Integrating a defined PRNG, such as the mersenne twister, would give all platforms the same, correct behavior.
327433 No Perforce job exists for this issue. 0 327777
6 years, 45 weeks, 6 days ago 0|i1khtz:
ZooKeeper ZOOKEEPER-1704

Please add download link for tutorial

New Feature Open Trivial Unresolved Unassigned Hayden Schultz Hayden Schultz 09/May/13 17:04   09/May/13 17:04   3.4.5   documentation   0 1   http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html There's no obvious way to download the source file other than copy/paste. 327238 No Perforce job exists for this issue. 0 327582
6 years, 46 weeks ago 0|i1kgmn:
ZooKeeper ZOOKEEPER-1703

Please add instructions for running the tutorial

New Feature Resolved Minor Fixed Andor Molnar Hayden Schultz Hayden Schultz 09/May/13 17:03   13/Oct/17 19:57 13/Oct/17 19:22 3.4.5 3.4.11, 3.5.4, 3.6.0 documentation   0 5   tutorial http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html There's no instructions for running the tutorial. newbie 327237 No Perforce job exists for this issue. 0 327581
2 years, 22 weeks, 6 days ago 0|i1kgmf:
ZooKeeper ZOOKEEPER-1702

ZooKeeper client may write operation packets before receiving successful response to connection request, can cause TCP RST

Bug Closed Major Fixed Chris Nauroth Chris Nauroth Chris Nauroth 09/May/13 16:24   13/Mar/14 14:17 01/Jul/13 19:24 3.4.2 3.4.6, 3.5.0 java client   0 10   The problem occurs when a connection attempt is pending and there are multiple outbound packets in the queue for other operations. In {{ClientCnxnSocketNIO#doIO}}, it is possible to receive notification that the socket is writable for the next operation packet before receiving notification that the socket is readable for the connection response from the server. If the server decides that the session is expired, then it responds by immediately closing the socket on its side. If the client has written packets after the server has closed its end of the socket, then the TCP stack may choose to abort the connection with an RST. When this happens, the client doesn't receive an orderly shutdown, and ultimately it fails to deliver a session expired event to the application. 327217 No Perforce job exists for this issue. 1 327561
6 years, 2 weeks ago
Reviewed
0|i1kghz:
ZooKeeper ZOOKEEPER-1701

When new and old config have the same version, no need to write new config to disk or create new connections

Improvement Resolved Minor Fixed Alexander Shraer Alexander Shraer Alexander Shraer 08/May/13 20:53   01/Apr/14 07:10 31/Mar/14 21:46 3.5.0 3.5.0 server   0 3   setLastSeenQuorumVerifier in QuorumPeer.java always writes the new config to disk and tries to make new connections to servers in new config. When the new config has the same version as the committed one (e.g., when the config received in a NEWLEADER message is already known to the follower), there's no need to write it to disk or to create new connections. 327090 No Perforce job exists for this issue. 2 327434
5 years, 51 weeks, 2 days ago 0|i1kfpr:
ZooKeeper ZOOKEEPER-1700

FLETest consistently failing - setLastSeenQuorumVerifier seems to be hanging

Bug Resolved Critical Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 07/May/13 20:43   12/May/13 07:09 11/May/13 08:51 3.5.0 3.5.0 quorum   0 5   I'm consistently seeing a failure on my laptop when running the FLETest "testJoin" test. What seems to be happening is that the call to setLastSeenQuorumVerifier is hanging.

See the following log from the test, notice 17:35:57 for the period in question. Note that I turned on debug logging and added a few log messages around the call to setLastSeenQuorumVerifier (you can see the code enter but never leave)

Note: I've applied ZOOKEEPER-1324 to trunk code and then run this test but that doesn't seem to help. Also note that this test is passing consistently when run against branch-3.4.

{noformat}
2013-05-07 17:35:57,859 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 16
2013-05-07 17:35:57,859 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@436] - LEADING - LEADER ELECTION TOOK - 17
2013-05-07 17:35:57,863 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:FileTxnSnapLog@270] - Snapshotting: 0x0 to /home/phunt/dev/zookeeper-trunk/build/test/tmp/test3690487600947307322.junit.dir/version-2/snapshot.0
2013-05-07 17:35:57,873 [myid:] - INFO [LearnerHandler-/127.0.0.1:34262:LearnerHandler@269] - Follower sid: 0 : info : 0.0.0.0:11222:11223:participant;0.0.0.0:11221
2013-05-07 17:35:57,878 [myid:] - INFO [LearnerHandler-/127.0.0.1:34262:LearnerHandler@328] - Synchronizing with Follower sid: 0 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x0
2013-05-07 17:35:57,878 [myid:] - DEBUG [LearnerHandler-/127.0.0.1:34262:LearnerHandler@395] - committedLog is empty but leader and follower are in sync, zxid=0x0
2013-05-07 17:35:57,878 [myid:] - INFO [LearnerHandler-/127.0.0.1:34262:LearnerHandler@404] - Sending DIFF
2013-05-07 17:35:57,879 [myid:] - DEBUG [LearnerHandler-/127.0.0.1:34262:LearnerHandler@411] - Sending NEWLEADER message to 0
2013-05-07 17:35:57,880 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@331] - Getting a diff from the leader 0x0
2013-05-07 17:35:57,885 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@457] - Learner received NEWLEADER message
2013-05-07 17:35:57,885 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@460] - NEWLEADER calling configfromstring
2013-05-07 17:35:57,885 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@462] - NEWLEADER setting quorum verifier
2013-05-07 17:35:57,886 [myid:] - WARN [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:QuorumPeer@1218] - setLastSeenQuorumVerifier called with stale config 0. Current version: 0
2013-05-07 17:36:01,880 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@585] - Shutting down
2013-05-07 17:36:01,881 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@591] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Waiting for a quorum of followers, only synced with sids: [ [1] ]
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:591)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:487)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:949)
2013-05-07 17:36:01,881 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:ZooKeeperServer@398] - shutting down
2013-05-07 17:36:01,881 [myid:] - INFO [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:11225:Leader$LearnerCnxAcceptor@398] - exception while shutting down acceptor: java.net.SocketException: Socket closed
2013-05-07 17:36:01,882 [myid:] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer@979] - PeerState set to LOOKING
2013-05-07 17:36:01,882 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer@863] - LOOKING
2013-05-07 17:36:01,883 [myid:] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer@792] - Initializing leader election protocol...
{noformat}
326895 No Perforce job exists for this issue. 2 327240
6 years, 45 weeks, 4 days ago 0|i1ke8n:
ZooKeeper ZOOKEEPER-1699

Leader should timeout and give up leadership when losing quorum of last proposed configuration

Bug Resolved Blocker Fixed Alexander Shraer Alexander Shraer Alexander Shraer 03/May/13 18:47   21/May/14 18:54 21/May/14 13:49 3.5.0 3.5.0 server   0 10   A leader gives up leadership when losing a quorum of the current configuration.
This doesn't take into account any proposed configuration. So, if
a reconfig operation is in progress and a quorum of the new configuration is not
responsive, the leader will just get stuck waiting for it to ACK the reconfig operation, and will never timeout.
326416 No Perforce job exists for this issue. 9 326761
5 years, 44 weeks, 1 day ago 0|i1kbaf:
ZooKeeper ZOOKEEPER-1698

Add deterministic host connection to Java client

Improvement Open Minor Unresolved Unassigned Owen Kim Owen Kim 01/May/13 21:04   26/Feb/14 17:45           2 4   "C client has zoo_deterministic_conn_order() to make the connection
order deterministic. We can add a similar feature to Java client."
326094 No Perforce job exists for this issue. 0 326439
6 years, 4 weeks, 1 day ago 0|i1k9av:
ZooKeeper ZOOKEEPER-1697

large snapshots can cause continuous quorum failure

Bug Closed Critical Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 30/Apr/13 20:38   13/Mar/14 14:17 11/May/13 09:35 3.4.3, 3.5.0 3.4.6, 3.5.0 server   0 12   I keep seeing this on the leader:

2013-04-30 01:18:39,754 INFO
org.apache.zookeeper.server.quorum.Leader: Shutdown called
java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 2
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:447)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:422)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753)

The followers are downloading the snapshot when this happens, and are
trying to do their first ACK to the leader, the ack fails with broken
pipe.

In this case the snapshots are large and the config has increased the
initLimit. syncLimit is small - 10 or so with ticktime of 2000. Note
this is 3.4.3 with ZOOKEEPER-1521 applied.

I originally speculated that
https://issues.apache.org/jira/browse/ZOOKEEPER-1521 might be related.
I thought I might have broken something for this environment. That
doesn't look to be the case.

As it looks now it seems that 1521 didn't go far enough. The leader
verifies that all followers have ACK'd to the leader within the last
"syncLimit" time period. This runs all the time in the background on
the leader to identify the case where a follower drops. In this case
the followers take so long to load the snapshot that this check fails
the very first time, as a result the leader drops (not enough ack'd
followers w/in the sync limit) and re-election happens. This repeats
forever. (the above error)

this is the call:
org.apache.zookeeper.server.quorum.LearnerHandler.synced() that's at
odds.

look at setting of tickOfLastAck in
org.apache.zookeeper.server.quorum.LearnerHandler.run()
It's not set until the follower first acks - in this case I can see
that the followers are not getting to the ack prior to the leader
shutting down due to the error log above.

It seems that sync() should probably use the init limit until the
first ack comes in from the follower. I also see that while tickOfLastAck and leader.self.tick is shared btw two threads there is no synchronization of the shared resources.
325927 No Perforce job exists for this issue. 6 326272
6 years, 2 weeks ago 0|i1k89r:
ZooKeeper ZOOKEEPER-1696

Fail to run zookeeper client on Weblogic application server

Bug Closed Critical Fixed Jeffrey Zhong Dmitry Konstantinov Dmitry Konstantinov 24/Apr/13 09:39   13/Mar/14 14:16 27/Sep/13 19:26 3.4.5 3.4.6, 3.5.0 java client   6 12   Java version: jdk170_06
WebLogic Server Version: 10.3.6.0
The problem in details is described here: http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897
The provided link also contains a reference to fix implementation.

{noformat}
####<Apr 24, 2013 1:03:28 PM MSK> <Warning> <org.apache.zookeeper.ClientCnxn> <devapp090> <clust2> <[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (devapp090:2182)> <internal> <> <> <1366794208810> <BEA-000000> <WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.lang.IllegalArgumentException: No Configuration was registered that can handle the configuration named Client
at com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130)
at org.apache.zookeeper.client.ZooKeeperSaslClient.<init>(ZooKeeperSaslClient.java:97)
at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
>

{noformat}
324722 No Perforce job exists for this issue. 3 325067
6 years, 2 weeks ago 0|i1k0uf:
ZooKeeper ZOOKEEPER-1695

Inconsistent error code and type for new errors introduced by dynamic reconfiguration

Bug Resolved Blocker Fixed Michi Mutsuzaki Thawan Kooburat Thawan Kooburat 23/Apr/13 19:28   30/Apr/14 06:33 29/Apr/14 22:34 3.5.0 3.5.0 server   0 6   From KeeperException.Code, RECONFIGINPROGRESS and NEWCONFIGNOQUORUM are declared as system errors. However, their error code suggested that they are API errors.

We either need to move it to the right type or use the code from the right range
324610 No Perforce job exists for this issue. 4 324955
5 years, 47 weeks, 1 day ago 0|i1k05j:
ZooKeeper ZOOKEEPER-1694

ZooKeeper Leader sends a repeated NEWLEADER quorum packet to followers

Bug Resolved Minor Duplicate Unassigned Germán Blanco Germán Blanco 22/Apr/13 06:08   22/Apr/13 16:02 22/Apr/13 08:43 3.4.5, 3.5.0 3.5.0 quorum   0 3   Windows, Linux, MacOSX This is at least what it seems in the logs. This also seems to cause a second snapshot in the follower. patch 324273 No Perforce job exists for this issue. 1 324618
6 years, 48 weeks, 3 days ago 0|i1jy33:
ZooKeeper ZOOKEEPER-1693

process may core or hang when xid is overflowed

Bug Resolved Major Duplicate Unassigned Jacky007 Jacky007 19/Apr/13 08:14   09/Oct/13 02:38 09/Oct/13 02:38 3.4.5   c client, java client   0 1   The xid will be confused with AUTHXID(-4) when it is overflowed.

If the process send 4000 requests per second, it may core or hang after about ten days.
323946 No Perforce job exists for this issue. 0 324291
6 years, 24 weeks, 1 day ago 0|i1jw2f:
ZooKeeper ZOOKEEPER-1692

Add support for single member ensemble

Improvement Open Minor Unresolved Thawan Kooburat Thawan Kooburat Thawan Kooburat 15/Apr/13 18:49   08/Apr/17 05:34   3.4.0   quorum   0 5   In the past, we ran into problem where quorum could not be formed multiple times. It take a while to investigate the root cause and fix the problem.

Our current solution is to make it possible to run a quorum with a single member in it. Unlike standalone mode, it has to run as LeaderZooKeeper server, so that the observers can connect to it.

This will allow the operator to use this workaround to bring back the ensemble quickly while investigating the problem in background.


The main problem here is to allow the observers to connect with the leader when the quorum size is reduced to one. We don't want to update the (static) configuration on the observer since it require server restart. We are thinking of allowing the observer to connect to any participant which declared that it is the leader without running the leader election algorithm (because it won't have enough votes).
323043 No Perforce job exists for this issue. 0 323388
2 years, 49 weeks, 5 days ago 0|i1jqhr:
ZooKeeper ZOOKEEPER-1691

Add a flag to disable standalone mode

Improvement Resolved Major Fixed Helen Hastings Michi Mutsuzaki Michi Mutsuzaki 15/Apr/13 15:41   28/Jan/14 13:47 20/Jan/14 23:46   3.5.0 quorum   3 9   Currently you cannot use dynamic reconfiguration to bootstrap zookeeper cluster because the server goes into standalone mode when there is only one server in the cluster.

--Michi
323014 No Perforce job exists for this issue. 8 323359
6 years, 9 weeks, 1 day ago 0|i1jqbb:
ZooKeeper ZOOKEEPER-1690

Race condition when close sock may cause a NPE in sendBuffer

Bug Open Major Unresolved Unassigned Jacky007 Jacky007 15/Apr/13 07:53   15/Apr/13 07:59   3.4.6       0 2   In NIOServerCnxn.java
public void close() {
closeSock();
...
sk.cancel();

Close sock first, then cancel the channel.

public void sendBuffer(ByteBuffer bb) {
if ((sk.interestOps() & SelectionKey.OP_WRITE) == 0) {
...
sock.write(bb);

Get ops of the channel, then read sock (may be null)

I have noticed that the 3.5.0-branch has fixed the problem.
322937 No Perforce job exists for this issue. 0 323282
6 years, 49 weeks, 3 days ago 0|i1jpu7:
ZooKeeper ZOOKEEPER-1689

Remove JVMFLAGS completely from clients, if CLIENT_JVMFLAGS are also set

Bug Open Minor Unresolved Unassigned Jeff Lord Jeff Lord 12/Apr/13 11:30   06/Feb/17 16:24   3.4.5   scripts   0 2   In zkCli.sh, the CLIENT_JVMFLAGS are being passed along with regular JVMFLAGS, so the latter ends up overriding it anyhow if set. Can we please remove JVMFLAGS completely from clients, if CLIENT_JVMFLAGS are also set (i.e. use just one).

One example of how this can be detrimental is if you attempt to start a zookeeper-client session on the same host that is already running zookeeper and use the default config directory. If the zookeeper server has jmx enabled than the client will also pick up that port and attempt to bind resulting in a failure

# /usr/bin/zookeeper-client
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 9010; nested exception is:
java.net.BindException: Address already in use
322669 No Perforce job exists for this issue. 0 323014
3 years, 6 weeks, 3 days ago 0|i1jo6v:
ZooKeeper ZOOKEEPER-1688

Transparent encryption of on-disk files

New Feature Open Major Unresolved Unassigned Andrew Kyle Purtell Andrew Kyle Purtell 10/Apr/13 15:28   07/May/14 10:35   3.5.0       0 9   We propose to introduce optional transparent encryption of snapshots and commit logs on disk. The goal is to protect against the leakage of sensitive information from files at rest, due to accidental misconfiguration of filesystem permissions, improper decommissioning, or improper disk disposal. This change would introduce a new ServerConfig option that allows the administrator to select the desired persistence implementation by classname, and new persistence classes extending the File* classes that wrap current formats in encrypted containers. Otherwise and by default the current File* classes will be used without change. If enabled, transparent encryption of all on disk structures will be accomplished with a shared cluster key made available to the quorum peers via the Java Keystore (supporting various store options, including hardware security module integration). Small modifications to the LogFormatter and SnapshotFormatter utilities will be needed. A new utility for offline key rotation will also be provided.

These changes will not introduce any new dependencies. The standard Java Cryptographic Extensions (JCE) are sufficient for implementation and can benefit from potential acceleration options provided by JCE now or future.
322312 No Perforce job exists for this issue. 1 322657
5 years, 46 weeks, 1 day ago 0|i1jlzj:
ZooKeeper ZOOKEEPER-1687

Number of past transactions retains in ZKDatabase.committedLog should be configurable

Improvement Open Minor Unresolved Unassigned Maho NAKATA Maho NAKATA 08/Apr/13 12:55   15/Apr/13 07:57           0 2   ZKDatabase.committedLog retains the past 500 transactions. In case of memory usage is more important than speed and vice versa, this should be configurable. 321814 No Perforce job exists for this issue. 0 322159
6 years, 49 weeks, 3 days ago memory, transactions 0|i1jiwv:
ZooKeeper ZOOKEEPER-1686

Publish ZK 3.4.5 test jar

Bug Resolved Major Fixed Patrick D. Hunt Todd Lipcon Todd Lipcon 05/Apr/13 18:17   03/Oct/13 19:54 29/Sep/13 23:58 3.4.5 3.4.4, 3.4.5 build, tests   0 6   ZooKeeper 3.4.2 used to publish a jar with the tests classifier for use by downstream project tests. It seems this didn't get published for 3.4.4 or 3.4.5 (see https://repository.apache.org/index.html#nexus-search;quick~org.apache.zookeeper). Would someone mind please publishing these artifacts? 321556 No Perforce job exists for this issue. 0 321901
6 years, 25 weeks, 3 days ago
Reviewed
0|i1jhbj:
ZooKeeper ZOOKEEPER-1685

Zookeeper client hard codes the server principal to zookeeper

Bug Open Major Unresolved Unassigned Arpit Gupta Arpit Gupta 05/Apr/13 17:03   30/May/19 11:38   3.4.5       0 3   Noticed this while debugging a secure deploy. The server was started with the principal zk/_HOST

When a client tried to connect to this it tried to setup a secure connection to server zookeeper/_HOST and failed authentication.

In ClientCnxn.java

{code}
try {
zooKeeperSaslClient = new ZooKeeperSaslClient("zookeeper/"+addr.getHostName());
} catch (LoginException e) {
{code}
321551 No Perforce job exists for this issue. 0 321896
42 weeks ago 0|i1jhaf:
ZooKeeper ZOOKEEPER-1684

Failure to update socket addresses on immedate connection

Bug Open Major Unresolved Unassigned Shevek Shevek 05/Apr/13 16:43   05/Feb/20 07:15     3.7.0, 3.5.8     0 1   I quote:

void registerAndConnect(SocketChannel sock, InetSocketAddress addr)
throws IOException {
sockKey = sock.register(selector, SelectionKey.OP_CONNECT);
boolean immediateConnect = sock.connect(addr);
if (immediateConnect)
{ sendThread.primeConnection(); }

}

In the immediate case, there are several bugs:

a) updateSocketAddresses() is never called, as it is when the select-loop in doTransport(). This means that clientCnxnSocket.getRemoteSocketAddress() will return null for the lifetime of this socket?
b) CONNECT still in the interest set for the socket.
c) updateLastSendAndHeard() is never called either.
321550 No Perforce job exists for this issue. 1 321895
6 years, 49 weeks, 2 days ago 0|i1jha7:
ZooKeeper ZOOKEEPER-1683

ZooKeeper client NPE when updating server list on disconnected client

Bug Resolved Major Fixed Alexander Shraer Shevek Shevek 04/Apr/13 18:20   18/Jul/14 07:35 17/Jul/14 16:58 3.5.0 3.5.0 java client   0 8   2013-04-04 22:16:15,872 ERROR [pool-4-thread-1] com.netflix.curator.ConnectionState.getZooKeeper (ConnectionState.java:84) - Background exception caught
java.lang.NullPointerException
at org.apache.zookeeper.client.StaticHostProvider.updateServerList(StaticHostProvider.java:161) ~[zookeeper-3.5.0.jar:3.5.0--1]
at org.apache.zookeeper.ZooKeeper.updateServerList(ZooKeeper.java:183) ~[zookeeper-3.5.0.jar:3.5.0--1]
at com.netflix.curator.HandleHolder$1$1.setConnectionString(HandleHolder.java:121) ~[curator-client-1.3.5-SNAPSHOT.jar:?]


The duff code is this:

ClientCnxnSocket clientCnxnSocket = cnxn.sendThread.getClientCnxnSocket();
InetSocketAddress currentHost = (InetSocketAddress) clientCnxnSocket.getRemoteSocketAddress();
boolean reconfigMode = hostProvider.updateServerList(serverAddresses, currentHost);

Now, currentHost might be null, if we're not yet connected. But StaticHostProvider.updateServerList indirects on it unconditionally.

This would be caught by findbugs.
321343 No Perforce job exists for this issue. 8 321688
5 years, 35 weeks, 6 days ago 0|i1jg0n:
ZooKeeper ZOOKEEPER-1682

Method to request all zookeepers in cluster

Improvement Resolved Minor Duplicate Unassigned John Vines John Vines 02/Apr/13 10:57   02/Apr/13 14:01 02/Apr/13 14:01         0 2   I would like to see an API feature to request the list of all servers in the cluster. The idea here is that a client doesn't have to know about all servers to benefit from the distributed nature of Zookeeper. If they can connect to one, they can have their bases covered from there on out. 320783 No Perforce job exists for this issue. 0 321124
6 years, 51 weeks, 2 days ago 0|i1jcjb:
ZooKeeper ZOOKEEPER-1681

ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare the dep as optional

Improvement Patch Available Major Unresolved Stevo Slavić John Sirois John Sirois 02/Apr/13 09:05   05/Feb/20 07:11   3.4.0, 3.4.1, 3.4.2, 3.4.4, 3.4.5 3.7.0, 3.5.8     3 5   For example in [3.4.5|http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom] we see:
{code}
$ curl -sS http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom | grep -B1 -A4 org.jboss.netty
<dependency>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
<version>3.2.2.Final</version>
<scope>compile</scope>
</dependency>
{code}

As a consumer I can depend on zookeeper with an exclude for org.jboss.netty#netty or I can let my transitive dep resolver pick a winner. This might be fine, except for those who might be using a more modern netty published under the newish io.netty groupId. With this twist you get both org.jboss.netty#netty;foo and io.netty#netty;bar on your classpath and runtime errors ensue from incompatibilities. unless you add an exclude against zookeeper (and clearly don't enable the zk netty nio handling.)

I propose that this is a pom bug although this is debatable. Clearly as currently packaged zookeeper needs netty to compile, but I'd argue since it does not need netty to run, either the scope should be provided or optional or a zookeeper-netty lib should be broken out as an optional dependency and this new dep published by zookeeper can have a proper compile dependency on netty.

320756 No Perforce job exists for this issue. 1 321097
1 year, 45 weeks ago 0|i1jcdb:
ZooKeeper ZOOKEEPER-1680

Cannot connect with a given sessionId - it is discarded

Bug Open Major Unresolved Unassigned Shevek Shevek 29/Mar/13 19:39   02/Apr/13 17:33           0 1   While the API permits construction of a ZooKeeper client object with a given sessionId, the sessionId can never be used:

ClientCnxn line 850: long sessId = (seenRwServerBefore) ? sessionId : 0;

The only person who sets seenRwServerBefore is onConnected().

Therefore, it appears that passing a sessionId into a ZooKeeper constructor has no effect, as the ClientCnxn has never seen an RW server before, so it discards it anyway.
320371 No Perforce job exists for this issue. 0 320712
6 years, 51 weeks, 6 days ago 0|i1j9zr:
ZooKeeper ZOOKEEPER-1679

c client: use -Wdeclaration-after-statement

Improvement Resolved Minor Fixed Michi Mutsuzaki Michi Mutsuzaki Michi Mutsuzaki 29/Mar/13 15:34   21/Aug/13 07:06 21/Aug/13 05:53 3.4.5 3.5.0 c client   0 3   Visual studio still doesn't support c99.

--Michi
320340 No Perforce job exists for this issue. 1 320681
6 years, 31 weeks, 1 day ago 0|i1j9sv:
ZooKeeper ZOOKEEPER-1678

Server fails to join quorum when a peer is unreachable (5 ZK server setup)

Bug Open Major Unresolved Unassigned Julio Lopez Julio Lopez 28/Mar/13 20:19   09/Jul/13 05:43   3.4.5   leaderElection   1 7   java version "1.6.0_32"
Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode)

Distributor ID: Ubuntu
Description: Ubuntu 12.04.1 LTS
Release: 12.04
Codename: precise

uname -a Linux ha-vani3-0 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
In a 5-node ZK cluster setup, in the following state:
* 1 host is down / not reachable.
* 4 hosts are up.
* 3 ZK servers are in quorum.
* a 4th ZK server was restarted and is trying to re-join the quorum.

The 4th server is not able to rejoin the quorum because the connection to the host that is not established, and apparently takes to long to timeout.

Stack traces and additional information coming.
320189 No Perforce job exists for this issue. 0 320530
6 years, 37 weeks, 2 days ago 0|i1j8vb:
ZooKeeper ZOOKEEPER-1677

Misuse of INET_ADDRSTRLEN

Bug Open Major Unresolved Marshall McMullen Shevek Shevek 28/Mar/13 17:00   28/Feb/20 16:56   3.5.0 3.7.0, 3.5.8     1 6   ZOOKEEPER-1355. Add zk.updateServerList(newServerList) (Alex Shraer, Marshall McMullen via fpj)



git-svn-id: https://svn.apache.org/repos/asf/zookeeper/trunk@1410731 13f79535-47bb-0310-9956-ffa450edef68


+int addrvec_contains(const addrvec_t *avec, const struct sockaddr_storage *addr)
+{
+ if (!avec || !addr)
+ {
+ return 0;
+ }
+
+ int i = 0;
+ for (i = 0; i < avec->count; i++)
+ {
+ if(memcmp(&avec->data[i], addr, INET_ADDRSTRLEN) == 0)
+ return 1;
+ }
+
+ return 0;
+}


Pretty sure that should be sizeof(sockaddr_storage). INET_ADDRSTRLEN is the size of the character buffer which needs to be allocated for the return value of inet_ntop, which seems to be totally wrong.
320146 No Perforce job exists for this issue. 9 320487
1 year, 17 weeks ago 0|i1j8lr:
ZooKeeper ZOOKEEPER-1676

C client zookeeper_interest returning ZOK on Connection Loss

Bug Closed Blocker Not A Problem Yunong Xiao Yunong Xiao Yunong Xiao 28/Mar/13 13:09   04/Sep/16 00:57 22/May/16 18:18 3.4.3   c client   1 8   All I have a fairly simple single-threaded C client set up -- single-threaded
because we are embedding zk in the node.js/libuv runtime -- which consists of
the following algorithm:

zookeeper_interest(); select();
// perform zookeeper api calls
zookeeper_process();

I've noticed that zookeeper_interest in the C client never returns error if it
is unable to connect to the zk server.

From the spec of the zookeeper_interest API, I see that zookeeper_interest is
supposed to return ZCONNECTIONLOSS when disconnected from the client. However,
digging into the code, I see that the client is making a non-blocking connect
call
https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613
, and returning ZOK
https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684

If we assume that the server is not up, this will mean that the subsequent
select() call would return 0, since the fd is not ready, and future calls to
zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS.
Thus an upstream client will never be aware that the connection is lost.

I don't think this is the expected behavior. I have temporarily patched the zk
C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still
unable to connect after session_timeout has been exceeded.

I have included a patch for the client which fixes this for release 3.4.3 6b35e96 in this branch: https://github.com/yunong/zookeeper/tree/release-3.4.3-patched Here's the patch https://gist.github.com/yunong/efe869a0345867d54adf

For more information, please see this email thread. http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201211.mbox/%3C11A8E7C3-4DDE-45D8-ABEC-A8A4D32CF647@gmail.com%3E
320094 No Perforce job exists for this issue. 0 320435
3 years, 43 weeks, 4 days ago Ok, let's resolve this one and document it. 0|i1j8a7:
ZooKeeper ZOOKEEPER-1675

Make sync a quorum operation

Bug Open Major Unresolved Michael Han Alexander Shraer Alexander Shraer 26/Mar/13 16:49   31/Oct/19 06:44   3.4.0, 3.5.0       0 8   sync + read is supposed to return at least the latest write that completes before the sync starts. This is true if the leader doesn't change, but when it does it may not work. The problem happens when the old leader L1 still thinks that it is the leader but some other leader L2 was already elected and committed some operations. Suppose that follower F is connected to L1 and invokes a sync. Even though L1 responds to the sync, the recent operations committed by L2 will not be flushed to F so a subsequent read on F will not see these operations.

To prevent this we should broadcast the sync like updates.

This problem is also mentioned in Section 4.4 of the ZooKeeper peper (but the proposed solution there is insufficient to solve the issue).
319651 No Perforce job exists for this issue. 0 319992
20 weeks, 2 days ago 0|i1j5jr:
ZooKeeper ZOOKEEPER-1674

There is no need to clear & load the database across leader election

Improvement Open Major Unresolved Unassigned Jacky007 Jacky007 21/Mar/13 10:25   26/Jan/17 17:23           0 4   It is interesting to notice the piece of codes in QuorumPeer.java

/* ZKDatabase is a top level member of quorumpeer
* which will be used in all the zookeeperservers
* instantiated later. Also, it is created once on
* bootup and only thrown away in case of a truncate
* message from the leader
*/
private ZKDatabase zkDb;

It is introduced by ZOOKEEPER-596. Now, we just drop the database every leader election.

We can keep it safely with ZOOKEEPER-1549.
318719 No Perforce job exists for this issue. 0 319060
3 years, 50 weeks, 6 days ago 0|i1izsn:
ZooKeeper ZOOKEEPER-1673

Zookeeper don't support cidr in expression in ACL with ip scheme

Bug Resolved Minor Fixed Craig Condit Lipin Dmitriy Lipin Dmitriy 19/Mar/13 06:01   26/Apr/14 07:04 25/Apr/14 17:41 3.4.5 3.5.0     1 8   Currently, when i try to set ACL with cidr in expression, i get exception:

{code}
[zk: localhost:2181(CONNECTED) 2] setAcl /AS0 ip:127.0.0.1/8:cdrwa
Exception in thread "main" org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /AS0
at org.apache.zookeeper.KeeperException.create(KeeperException.java:112)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.setACL(ZooKeeper.java:1175)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:716)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270)
{code}

Also, there is no support for CIDR in IPAuthenticationProvider.isValid, but IPAuthenticationProvider.matches has it.
auth 318211 No Perforce job exists for this issue. 3 318552
5 years, 47 weeks, 5 days ago 0|i1iwnr:
ZooKeeper ZOOKEEPER-1672

zookeeper client does not accept "-members" option in reconfig command

Bug Resolved Trivial Fixed Xiaoshuang Wang Xiaoshuang Wang Xiaoshuang Wang 18/Mar/13 20:40   20/Mar/13 07:30 20/Mar/13 02:21 3.5.0 3.5.0 java client   0 5 0 0 0% Zookeeper trunk Without the modification to src/java/main/org/apache/zookeeper/cli/ReconfigCommand.java line 88, the reconfig command will not accept "-member" options by complaining not using the right usage. 0% 0% 0 0 patch 318169 No Perforce job exists for this issue. 1 318510
7 years, 1 week, 1 day ago
Reviewed
0|i1iwef:
ZooKeeper ZOOKEEPER-1671

Remove dependency on log4j 1.2.15

Bug Open Minor Unresolved Unassigned Alex Blewitt Alex Blewitt 18/Mar/13 09:22   18/Mar/13 09:22           0 1   The zookeeper dependency 3.4.5 (latest) depends explicitly on log4j 1.2.15, which has dependencies on com.sun.jmx which can't be resolved from Maven central.

Please change the dependency to either 1.2.16, which declares these as optional, or 1.2.14, which doesn't have them at all.

http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom

<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.15</version>
<scope>compile</scope>
</dependency>

This should be modified to 1.2.14 or 1.2.16 as above.

It's also not clear why this is used at all; it would be better for ZooKeeper to depend only on slf4j-api, and let users determine what the right slf4j logging implementation is. With this approach, it's not possible to swap out log4j for something else.
318023 No Perforce job exists for this issue. 0 318364
7 years, 1 week, 3 days ago 0|i1ivhz:
ZooKeeper ZOOKEEPER-1670

zookeeper should set a default value for SERVER_JVMFLAGS and CLIENT_JVMFLAGS so that memory usage is controlled

Bug Resolved Major Fixed Flavio Paiva Junqueira Arpit Gupta Arpit Gupta 15/Mar/13 17:36   18/Dec/19 08:34 01/Oct/13 17:32 3.4.5 3.5.0     0 7   We noticed this with jdk 1.6 where if no heap size is set the process takes up to 1/4 of mem available on the machine.

More info http://stackoverflow.com/questions/3428251/is-there-a-default-xmx-setting-for-java-1-5

You can run the following command to see what are the defaults for your machine

{code}
java -XX:+PrintFlagsFinal -version 2>&1 | grep -i -E 'heapsize|permsize|version'
{code}

And we noticed on two different class of machines that this was 1/4th of total memory on the machine.
317766 No Perforce job exists for this issue. 5 318107
13 weeks, 1 day ago 0|i1itwv:
ZooKeeper ZOOKEEPER-1669

Operations to server will be timed-out while thousands of sessions expired same time

Improvement Resolved Major Fixed Cheney Sun tokoot tokoot 15/Mar/13 03:32   03/Aug/17 21:47 03/Aug/17 12:23 3.3.5 3.4.11 server   0 11   If there are thousands of clients, and most of them disconnect with server same time(client restarted or servers partitioned with clients), the server will busy to close those "connections" and become unavailable. The problem is in following:
private void closeSessionWithoutWakeup(long sessionId) {
HashSet<NIOServerCnxn> cnxns;
synchronized (this.cnxns) {
cnxns = (HashSet<NIOServerCnxn>)this.cnxns.clone(); // other thread will block because of here
}
...
}

A real world example that demonstrated this problem (Kudos to [~sun.cheney]):
{noformat}
The issue is raised while tens thousands of clients try to reconnect ZooKeeper service.
Actually, we came across the issue during maintaining our HBase cluster, which used a 5-server ZooKeeper cluster.
The HBase cluster was composed of many many regionservers (in thousand order of magnitude),
and connected by tens thousands of clients to do massive reads/writes.
Because the r/w throughput is very high, ZooKeeper zxid increased quickly as well.
Basically, each two or three weeks, Zookeeper would make leader relection triggered by the zxid roll over.
The leader relection will cause the clients(HBase regionservers and HBase clients) disconnected
and reconnected with Zookeeper servers in the mean time, and try to renew the sessions.

In current implementation of session renew, NIOServerCnxnFactory will clone all the connections at first
in order to avoid race condition in multi-threads and go iterate the cloned connection set one by one to
find the related session to renew. It's very time consuming. In our case (described above),
it caused many region servers can't successfully renew session before session timeout,
and eventually the HBase cluster lose these region servers and affect the HBase stability.
The change is to make refactoring to the close session logic and introduce a ConcurrentHashMap
to store session id and connection map relation, which is a thread-safe data structure
and eliminate the necessary to clone the connection set at first.
{noformat}
performance 317643 No Perforce job exists for this issue. 0 317984
2 years, 32 weeks, 6 days ago 0|i1it5j:
ZooKeeper ZOOKEEPER-1668

“Memory leak” about permgen

Improvement Resolved Major Not A Problem Unassigned tokoot tokoot 15/Mar/13 03:24   10/Oct/13 13:46 10/Oct/13 13:46 3.3.5   jmx, server   0 2   For each connection, a ConnectionBean will be created to represent this connection at finishSessionInit:
| ...
| jmxConnectionBean = new ConnectionBean(this, zk);
| MBeanRegistry.getInstance().register(jmxConnectionBean, zk.jmxServerBean);
|| ...
|| ObjectName oname = makeObjectName(path, bean);
||| ...
||| return new ObjectName(beanName.toString());
|||| ...
|||| _canonicalName = (new String(canonical_chars, 0, prop_index)).intern();

So, for every connection, it takes dozens of bytes at permgen. With connection established constantly, the usage of permgen will increase continuously.

Is it reasonable or necessary to manage each connection with ConnectionBean?
317641 No Perforce job exists for this issue. 0 317982
6 years, 24 weeks, 1 day ago 0|i1it53:
ZooKeeper ZOOKEEPER-1667

Watch event isn't handled correctly when a client reestablish to a server

Bug Closed Blocker Fixed Flavio Paiva Junqueira Jacky007 Jacky007 14/Mar/13 04:56   12/May/15 01:37 22/Oct/13 06:56 3.3.6, 3.4.5 3.4.6, 3.5.0 server   1 14   When a client reestablish to a server, it will send the watches which have not been triggered. But the code in DataTree does not handle it correctly.

It is obvious, we just do not notice it :)

scenario:
1) Client a set a data watch on /d, then disconnect, client b delete /d and create it again. When client a reestablish to zk, it will receive a NodeCreated rather than a NodeDataChanged.
2) Client a set a exists watch on /e(not exist), then disconnect, client b create /e. When client a reestablish to zk, it will receive a NodeDataChanged rather than a NodeCreated.

317480 No Perforce job exists for this issue. 5 317821
4 years, 45 weeks, 2 days ago 0|i1is5b:
ZooKeeper ZOOKEEPER-1666

Avoid Reverse DNS lookup if the hostname in connection string is literal IP address.

Improvement Closed Major Fixed George Cao George Cao George Cao 14/Mar/13 00:58   30/Jan/17 07:08 13/Nov/13 09:18 3.4.5 3.4.6, 3.5.0 java client   0 12   In our ENV, if the InetSocketAddress.getHostName() is called and the host name in the connection string are literal IP address, then the call will trigger a reverse DNS lookup which is very slow.
And in this situation, the host name can simply set as the IP without causing any problem.
patch, test 317447 No Perforce job exists for this issue. 4 317788
6 years, 2 weeks ago Try to avoid reverse name service look up when the connection string consists of literal IP addresses but not real host names. 0|i1irxz:
ZooKeeper ZOOKEEPER-1665

Support recursive deletion in multi

New Feature Resolved Major Won't Fix Unassigned Ted Yu Ted Yu 14/Mar/13 00:08   02/Apr/14 14:38 02/Apr/14 14:38         0 5   Use case in HBase is that we need to recursively delete multiple subtrees:
{code}
ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode);
ZKUtil.deleteChildrenRecursively(watcher, reachedZnode);
ZKUtil.deleteChildrenRecursively(watcher, abortZnode);
{code}
To achieve high consistency, it is desirable to use multi for the above operations.

This JIRA adds support for recursive deletion in multi.
317442 No Perforce job exists for this issue. 0 317783
5 years, 51 weeks, 1 day ago 0|i1irwv:
ZooKeeper ZOOKEEPER-1664

Kerberos auth doesn't work with native platform GSS integration

Bug Resolved Major Fixed Unassigned Boaz Kelmer Boaz Kelmer 13/Mar/13 14:58   11/Sep/13 18:00 11/Sep/13 18:00 3.4.5, 3.5.0   java client, server   0 6   Linux (and likely also Solaris). Java on Linux/Solaris can be set up to use the native (via C library)
GSS implementation. This is configured by setting the system property
sun.security.jgss.native=true
When using this feature, ZooKeeper Sasl/JGSS authentication doesn't work.
The reason is explained in
http://docs.oracle.com/javase/6/docs/technotes/guides/security/jgss/jgss-features.html

"""
[when using native GSS...]
In addition, when performing operations as a particular Subject, e.g.
Subject.doAs(...) or Subject.doAsPrivileged(...), the to-be-used
GSSCredential should be added to Subject's private credential set.
Otherwise, the GSS operations will fail since no credential is found.
"""
317331 No Perforce job exists for this issue. 3 317672
6 years, 28 weeks, 1 day ago 0|i1ir87:
ZooKeeper ZOOKEEPER-1663

scripts don't work when path contains spaces

Bug Closed Minor Fixed Amichai Rothman Amichai Rothman Amichai Rothman 12/Mar/13 11:25   26/Jun/14 07:19 20/May/13 13:12 3.4.5 3.4.6, 3.5.0 scripts   0 6   Kubuntu 12.10 (GNU bash 4.2.37) The shell scripts (bin/zk*.sh) don't work when there are spaces in the zookeeper or java paths. 317082 No Perforce job exists for this issue. 4 317423
5 years, 39 weeks ago 0|i1ipov:
ZooKeeper ZOOKEEPER-1662

Fix to two small bugs in ReconfigTest.testPortChange()

Bug Resolved Minor Fixed Alexander Shraer Alexander Shraer Alexander Shraer 08/Mar/13 19:27   11/Mar/14 07:11 10/Mar/14 21:45 3.5.0 3.5.0 tests   0 4   Fix to two small bugs in ReconfigTest.testPortChange():
1. the test expected a port change to happen immediately, which is not necessarily
going to happen. The fix waits a bit and also tries several times.
2. when a client port changes, the test created a new ZooKeeper handle, but didn't specify a Watcher object, which generated some NullPointerException events when the watcher was triggered.
316640 No Perforce job exists for this issue. 1 316982
6 years, 2 weeks, 2 days ago 0|i1imz3:
ZooKeeper ZOOKEEPER-1661

Random (?) 5s delay when establishing connection

Bug Open Major Unresolved Unassigned Yan Pujante Yan Pujante 07/Mar/13 16:11   12/Mar/13 06:41   3.4.5   server   1 6   I have a client connecting to ZooKeeper and I am sometimes seeing a 5s delay before the opening of the socket connection:

Here is the output on the client side:
{noformat}
2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:host.name=xeon
2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.version=1.6.0_41
2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.vendor=Apple Inc.
2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.home=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.class.path=/local/java/lib/tools.jar:lib/ant-1.8.4.jar:lib/ant-antlr-1.8.4.jar:lib/ant-junit-1.8.4.jar:lib/ant-launcher-1.8.4.jar:lib/antlr-2.7.7.jar:lib/asm-4.0.jar:lib/asm-analysis-4.0.jar:lib/asm-commons-4.0.jar:lib/asm-tree-4.0.jar:lib/asm-util-4.0.jar:lib/commons-cli-1.2.jar:lib/groovy-2.0.7.jar:lib/groovy-ant-2.0.7.jar:lib/groovy-groovydoc-2.0.7.jar:lib/groovy-templates-2.0.7.jar:lib/groovy-xml-2.0.7.jar:lib/jackson-annotations-2.1.4.jar:lib/jackson-core-2.1.4.jar:lib/jackson-databind-2.1.4.jar:lib/jline-0.9.94.jar:lib/json-20090211.jar:lib/jul-to-slf4j-1.6.2.jar:lib/junit-3.8.1.jar:lib/log4j-1.2.16.jar:lib/netty-3.2.2.Final.jar:lib/org.linkedin.util-core-1.8.glu47.0.jar:lib/org.linkedin.util-groovy-1.8.glu47.0.jar:lib/org.linkedin.zookeeper-cli-impl-1.5.glu47.0-SNAPSHOT.jar:lib/org.linkedin.zookeeper-impl-1.5.glu47.0-SNAPSHOT.jar:lib/slf4j-api-1.6.2.jar:lib/slf4j-log4j12-1.6.2.jar:lib/zookeeper-3.4.5.jar
2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.library.path=/local/instantclient10:.:/Users/ypujante/Library/Java/Extensions:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java
2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.io.tmpdir=/var/folders/dj/qmkmx5648xjf2n006s7hc1v80000gq/T/
2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.compiler=<NA>
2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:os.name=Mac OS X
2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:os.arch=x86_64
2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:os.version=10.8.2
2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:user.name=ypujante
2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:user.home=/Users/ypujante
2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:user.dir=/export/content/linkedin-zookeeper/org.linkedin.zookeeper-cli-1.5.glu47.0-SNAPSHOT
2013/03/07 10:53:48.731 INFO [org.apache.zookeeper.ZooKeeper] Initiating client connection, connectString=localhost:2181 sessionTimeout=100 watcher=org.linkedin.zookeeper.client.ZKClient@3823bdd1
2013/03/07 10:53:48.737 DEBUG [org.apache.zookeeper.ClientCnxn] zookeeper.disableAutoWatchReset is false
2013/03/07 10:53:48.756 DEBUG [org.linkedin.zookeeper.cli.ClientMain] Talking to zookeeper on localhost:2181
2013/03/07 10:53:53.763 INFO [org.apache.zookeeper.ClientCnxn] Opening socket connection to server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
{noformat}

From this output you can see the line at 10:53:48 => Initiating client connection
And then 5s later, at 10:53:53 => opening socket connection

Note that I did not see this delay/problem prior to upgrading to 3.4.5 (from 3.3.3)

Also note that sometimes there is no delay as in the following output!

{noformat}
2013/03/07 11:04:06.084 INFO [org.apache.zookeeper.ZooKeeper] Initiating client connection, connectString=localhost:2181 sessionTimeout=100 watcher=org.linkedin.zookeeper.client.ZKClient@1e670479
2013/03/07 11:04:06.089 DEBUG [org.apache.zookeeper.ClientCnxn] zookeeper.disableAutoWatchReset is false
2013/03/07 11:04:06.109 DEBUG [org.linkedin.zookeeper.cli.ClientMain] Talking to zookeeper on localhost:2181
2013/03/07 11:04:06.116 INFO [org.apache.zookeeper.ClientCnxn] Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
{noformat}

I will be more than happy to provide more details if necessary. The client code is open source and hosted on github @ https://github.com/linkedin/linkedin-zookeeper/blob/master/org.linkedin.zookeeper-cli-impl/src/main/groovy/org/linkedin/zookeeper/cli/ClientMain.groovy#L65

and is not doing mech but (under the cover)

new ZooKeeper("localhost:2181", 100, watcher)

and then wait until the SyncConnected even is received...

Thanks
Yan
316385 No Perforce job exists for this issue. 0 316728
7 years, 2 weeks, 2 days ago 0|i1ilen:
ZooKeeper ZOOKEEPER-1660

ZOOKEEPER-1987 Add documentation for dynamic reconfiguration

Sub-task Resolved Blocker Fixed Reed Wanderman-Milne Alexander Shraer Alexander Shraer 07/Mar/13 02:07   10/Nov/14 18:47 29/Aug/14 10:39 3.5.0 3.5.1, 3.6.0 documentation   0 10   Update user manual with reconfiguration info. 316233 No Perforce job exists for this issue. 3 316576
5 years, 19 weeks, 3 days ago
Reviewed
0|i1ikgv:
ZooKeeper ZOOKEEPER-1659

Add JMX support for dynamic reconfiguration

Bug Resolved Blocker Fixed Rakesh Radhakrishnan Alexander Shraer Alexander Shraer 07/Mar/13 01:50   04/Jun/14 16:50 04/Jun/14 16:08 3.5.0 3.5.0 server   0 10   We need to update JMX during reconfigurations. Currently, reconfiguration changes are not reflected in JConsole. 316231 No Perforce job exists for this issue. 8 316574
5 years, 42 weeks, 1 day ago 0|i1ikgf:
ZooKeeper ZOOKEEPER-1658

Support SRV records

Improvement Open Minor Unresolved Unassigned Devin Bayer Devin Bayer 03/Mar/13 07:12   03/Mar/13 07:12           2 3   We want to make client configuration easy so the quorum is just a single A rrset with multiple IP addresses. This isn't ideal because we need to hard-code the IPs off our zookeeper servers and they already have domain names. If zookeeper supported SRV, we could just do:

_zookeeper.example.com. 86400 IN SRV 10 60 2181 worker1
_zookeeper.example.com. 86400 IN SRV 10 20 2181 worker2
_zookeeper.example.com. 86400 IN SRV 10 10 2181 worker3

and -Dhbase.zookeeper.quorum=example.com
dns, zookeeper 315513 No Perforce job exists for this issue. 0 315857
7 years, 3 weeks, 4 days ago 0|i1ig1b:
ZooKeeper ZOOKEEPER-1657

Increased CPU usage by unnecessary SASL checks

Bug Closed Major Fixed Philip K. Warren Gunnar Wagenknecht Gunnar Wagenknecht 01/Mar/13 11:44   13/Mar/14 14:16 18/Sep/13 06:58 3.4.5 3.4.6, 3.5.0 java client   1 10   I did some profiling in one of our Java environments and found an interesting footprint in ZooKeeper. The SASL support seems to trigger a lot times on the client although it's not even in use.

Is there a switch to disable SASL completely?

The attached screenshot shows a 10-minute profiling session on one of our production Jetty servers. The Jetty server handles ~1k web requests per minute. The average response time per web request is a few milli seconds. The profiling was performed on a machine running for >24h.

We noticed a significant CPU increase on our servers when deploying an update from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% are spend in ZooKeeper.

A few notes/thoughts:
* {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to be the culprit
* {{javax.security.auth.login.Configuration.getConfiguration}} seems to be called very often?
* There is quite a bit reflection involved in {{java.security.AccessController.doPrivileged}}
* No security manager is active in the JVM: I tend to place an if-check in the code before calling {{AccessController.doPrivileged}}. When no SM is installed, the runnable can be called directly which safes cycles.
performance 315344 No Perforce job exists for this issue. 9 315688
6 years, 2 weeks ago 0|i1iezr:
ZooKeeper ZOOKEEPER-1656

OSGI - Missing import package - ClassNotFoundException

Bug Open Major Unresolved Unassigned Florian Pirchner Florian Pirchner 01/Mar/13 11:20   09/Oct/13 02:33   3.4.5       1 4   OSGi "Import package" are missing for bundle org.apache.hadoop.zookeeper.

I am getting an exception running the Zookeeper server in an OSGi environment.

ZookeeperServerMain uses
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

But there is no import in MANIFEST.mf:
Import-Package: javax.management,org.apache.log4j,org.osgi.framework;v
ersion="[1.4,2.0)",org.osgi.util.tracker;version="[1.1,2.0)"


I am sure that another missing package would be the subpackage of org.apache.log4j like org.apache.log4j.jmx.


Best, Florian




315339 No Perforce job exists for this issue. 0 315683
6 years, 24 weeks, 1 day ago 0|i1ieyn:
ZooKeeper ZOOKEEPER-1655

Make jline dependency optional in maven pom

Bug Resolved Major Fixed Thomas Weise Thomas Weise Thomas Weise 28/Feb/13 00:07   11/Oct/16 14:15 01/Oct/13 17:42 3.4.2 3.5.0 build   0 7   Old JLine version used in ZK CLI should not be pulled into downstream projects.
315051 No Perforce job exists for this issue. 2 315395
3 years, 23 weeks, 2 days ago
Reviewed
0|i1id6n:
ZooKeeper ZOOKEEPER-1654

bad documentation link on site

Bug Open Minor Unresolved Michael Han Camille Fournier Camille Fournier 27/Feb/13 16:00   06/Sep/17 09:44   3.4.5       0 3   If you go to this page:
http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html

Then try to click on Developer -> API Docs you'll get to
http://zookeeper.apache.org/doc/trunk/api/index.html

Which does not exist. Should point to:

http://zookeeper.apache.org/doc/current/api/index.html
314984 No Perforce job exists for this issue. 0 315328
2 years, 28 weeks, 1 day ago 0|i1icrr:
ZooKeeper ZOOKEEPER-1653

zookeeper fails to start because of inconsistent epoch

Bug Closed Blocker Fixed Michi Mutsuzaki Michi Mutsuzaki Michi Mutsuzaki 26/Feb/13 20:59   12/Jan/16 10:25 26/Nov/13 18:44 3.4.5 3.4.6 quorum   1 11   It looks like QuorumPeer.loadDataBase() could fail if the server was restarted after zk.takeSnapshot() but before finishing self.setCurrentEpoch(newEpoch) in Learner.java.

{code:java}
case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
zk.takeSnapshot();
self.setCurrentEpoch(newEpoch); // <<< got restarted here
snapshotTaken = true;
writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true);
break;
{code}

The server fails to start because currentEpoch is still 1 but the last processed zkid from the snapshot has been updated.

{noformat}
2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on disk
java.io.IOException: The current epoch, 1, is older than the last zxid, 8589934592
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413)
...
{noformat}

{noformat}
$ find datadir
datadir
datadir/version-2
datadir/version-2/currentEpoch.tmp
datadir/version-2/acceptedEpoch
datadir/version-2/snapshot.0
datadir/version-2/currentEpoch
datadir/version-2/snapshot.200000000

$ cat datadir/version-2/currentEpoch.tmp
2%
$ cat datadir/version-2/acceptedEpoch
2%
$ cat datadir/version-2/currentEpoch
1%
{noformat}
314830 No Perforce job exists for this issue. 5 315174
4 years, 10 weeks, 2 days ago ZOOKEEPER-1549.patch should fix this issue in 3.5 branch. 0|i1ibtj:
ZooKeeper ZOOKEEPER-1652

zookeeper java client does a reverse dns lookup when connecting

Bug Resolved Critical Duplicate Sean Bridges Sean Bridges Sean Bridges 26/Feb/13 13:45   26/Oct/16 10:25 04/Nov/13 21:34 3.4.5   java client   1 10   When connecting to zookeeper, the client does a reverse dns lookup on the hostname. In our environment, the reverse dns lookup takes 5 seconds to fail, causing zookeeper clients to connect slowly.

The reverse dns lookup occurs in ClientCnx in the calls to adr.getHostName()

{code}
setName(getName().replaceAll("\\(.*\\)",
"(" + addr.getHostName() + ":" + addr.getPort() + ")"));
try {
zooKeeperSaslClient = new ZooKeeperSaslClient("zookeeper/"+addr.getHostName());
} catch (LoginException e) {
{code}
314711 No Perforce job exists for this issue. 1 315055
3 years, 21 weeks, 1 day ago 0|i1ib33:
ZooKeeper ZOOKEEPER-1651

Add support for compressed snapshot

Improvement Resolved Major Fixed Brian Nixon Thawan Kooburat Thawan Kooburat 25/Feb/13 18:56   02/May/19 04:47 02/May/19 04:46   3.6.0 server   0 7   We want to keep many copies of snapshots on disk so that we can debug the problem afterward. However, the snapshot can be large, so we added a feature that allow the server to dump/load snapshot in a compressed format (snappy or gzip). This also benefit db loading and snapshotting time as well.

This is also depends on client workload. In one of our deployment where clients don't compress its data, we found that snappy compression work best. The snapshot size is reduced from 381M to 65MB. Db loading/and snapshotting time is also reduced by 20%.
314518 No Perforce job exists for this issue. 0 314862
46 weeks ago 0|i1i9w7:
ZooKeeper ZOOKEEPER-1650

testServerCnxnExpiry failing consistently on solaris apache jenkins

Bug Resolved Blocker Duplicate Rakesh Radhakrishnan Patrick D. Hunt Patrick D. Hunt 20/Feb/13 12:30   16/Mar/14 12:37 16/Mar/14 08:21 3.5.0 3.5.0 tests   0 2   testServerCnxnExpiry is failing consistently on solaris apache jenkins:
https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-solaris/475/testReport/org.apache.zookeeper.test/ServerCnxnTest/testServerCnxnExpiry/

Seems to have started around the time the NIO multi-threading changes were introduced - but it's hard to say (some of the history has been lost already).

Possibly just a bad test or timeouts not long enough...
313725 No Perforce job exists for this issue. 0 314070
6 years, 1 week, 4 days ago 0|i1i507:
ZooKeeper ZOOKEEPER-1649

Build RPM Package Error on CentOS 5

Bug Resolved Major Won't Fix Unassigned Shining Shining 19/Feb/13 04:47   03/Mar/16 11:22 03/Mar/16 11:22 3.4.5   build   0 1   CentOS 5.8 x86_64
JDK 1.6.0_21-b06
ant rpm
--------------------

rpm:
[copy] Copying 1 file to /tmp/zkpython_build_nshi/SOURCES
[rpm] Building the RPM based on the zkpython.spec file
[rpm] Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.62078
[rpm] Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.62078
[rpm] Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.62078
[rpm]
[rpm]
[rpm] RPM build errors:
[rpm] + umask 022
[rpm] + cd /tmp/zkpython_build_nshi/BUILD
[rpm] + LANG=C
[rpm] + export LANG
[rpm] + unset DISPLAY
[rpm] + tar fxz /tmp/zkpython_build_nshi/SOURCES/ZooKeeper-0.4.linux-x86_64.tar.gz -C /tmp/zkpython_build_nshi/BUILD
[rpm] + exit 0
[rpm] + umask 022
[rpm] + cd /tmp/zkpython_build_nshi/BUILD
[rpm] + LANG=C
[rpm] + export LANG
[rpm] + unset DISPLAY
[rpm] + exit 0
[rpm] + umask 022
[rpm] + cd /tmp/zkpython_build_nshi/BUILD
[rpm] + LANG=C
[rpm] + export LANG
[rpm] + unset DISPLAY
[rpm] + /bin/mv /tmp/zkpython_build_nshi/BUILD/usr /tmp/zkpython_build_nshi/BUILD
[rpm] /bin/mv: `/tmp/zkpython_build_nshi/BUILD/usr' and `/tmp/zkpython_build_nshi/BUILD/usr' are the same file
[rpm] error: Bad exit status from /var/tmp/rpm-tmp.62078 (%install)
[rpm] Bad exit status from /var/tmp/rpm-tmp.62078 (%install)

BUILD FAILED
/home/nshi/workspace/zookeeper-3.4.5/build.xml:955: The following error occurred while executing this line:
/home/nshi/workspace/zookeeper-3.4.5/src/contrib/build.xml:75: The following error occurred while executing this line:
/home/nshi/workspace/zookeeper-3.4.5/src/contrib/zkpython/build.xml:144: '/usr/bin/rpmbuild' failed with exit code 1
-------------
313457 No Perforce job exists for this issue. 0 313802
4 years, 3 weeks ago 0|i1i3cn:
ZooKeeper ZOOKEEPER-1648

Fix WatcherTest in JDK7

Bug Closed Minor Fixed Thawan Kooburat Thawan Kooburat Thawan Kooburat 18/Feb/13 18:27   13/Mar/14 14:17 19/Feb/13 02:56   3.4.6, 3.5.0 tests   0 4   JDK7 run unit tests in random order causing intermittent WatcherTest failure. The fix is to clean up static variable that interfere with other tests. 313408 No Perforce job exists for this issue. 2 313753
6 years, 2 weeks ago
Reviewed
0|i1i31r:
ZooKeeper ZOOKEEPER-1647

OSGi package import/export changes not applied to bin-jar

Bug Closed Major Fixed Arnoud Glimmerveen Arnoud Glimmerveen Arnoud Glimmerveen 17/Feb/13 10:03   13/Mar/14 14:16 19/Feb/13 03:29 3.4.6, 3.5.0 3.4.6, 3.5.0     0 4   Two recent changes related to the OSGi headers Import-Package and Export-Package (ZOOKEEPER-1334 and ZOOKEEPER-1645) were only applied to the JAR created in ant target *jar*, leaving the JAR created in target *bin-jar* (to be uploaded to Maven central) with the old (incorrect) OSGi headers. 313248 No Perforce job exists for this issue. 1 313593
6 years, 2 weeks ago
Reviewed
0|i1i227:
ZooKeeper ZOOKEEPER-1646

mt c client tests fail on Ubuntu Raring

Bug Closed Blocker Fixed Patrick D. Hunt James Page James Page 12/Feb/13 05:07   13/Mar/14 14:17 17/Oct/13 13:08 3.4.5, 3.5.0 3.4.6, 3.5.0 c client, tests   0 5   Ubuntu 13.04 (raring), glibc 2.17 Misc tests fail in the c client binding under the current Ubuntu development release:

./zktest-mt
ZooKeeper server startedRunning
Zookeeper_clientretry::testRetry ZooKeeper server started ZooKeeper server started : elapsed 9315 : OK
Zookeeper_operations::testAsyncWatcher1 : assertion : elapsed 1054
Zookeeper_operations::testAsyncGetOperation : assertion : elapsed 1055
Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : assertion : elapsed 1066
Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 : OK
Zookeeper_operations::testConcurrentOperations1 : assertion : elapsed 1055
Zookeeper_init::testBasic : elapsed 1 : OK
Zookeeper_init::testAddressResolution : elapsed 0 : OK
Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK
Zookeeper_init::testNullAddressString : elapsed 0 : OK
Zookeeper_init::testEmptyAddressString : elapsed 0 : OK
Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK
Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK
Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK
Zookeeper_init::testInvalidAddressString2 : elapsed 175 : OK
Zookeeper_init::testNonexistentHost : elapsed 92 : OK
Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK
Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK
Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 1 : OK
Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK
Zookeeper_close::testIOThreadStoppedOnExpire : assertion : elapsed 1056
Zookeeper_close::testCloseUnconnected : elapsed 0 : OK
Zookeeper_close::testCloseUnconnected1 : elapsed 91 : OK
Zookeeper_close::testCloseConnected1 : assertion : elapsed 1056
Zookeeper_close::testCloseFromWatcher1 : assertion : elapsed 1076
Zookeeper_simpleSystem::testAsyncWatcherAutoReset ZooKeeper server started : elapsed 12155 : OK
Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK
Zookeeper_simpleSystem::testNullData : elapsed 1031 : OK
Zookeeper_simpleSystem::testIPV6 : elapsed 1005 : OK
Zookeeper_simpleSystem::testPath : elapsed 1024 : OK
Zookeeper_simpleSystem::testPathValidation : elapsed 1053 : OK
Zookeeper_simpleSystem::testPing : elapsed 17287 : OK
Zookeeper_simpleSystem::testAcl : elapsed 1019 : OK
Zookeeper_simpleSystem::testChroot : elapsed 3052 : OK
Zookeeper_simpleSystem::testAuth : assertion : elapsed 7010
Zookeeper_simpleSystem::testHangingClient : elapsed 1015 : OK
Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 20556 : OK
Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 20563 : OK
Zookeeper_simpleSystem::testGetChildren2 : elapsed 1041 : OK
Zookeeper_multi::testCreate : elapsed 1017 : OK
Zookeeper_multi::testCreateDelete : elapsed 1007 : OK
Zookeeper_multi::testInvalidVersion : elapsed 1011 : OK
Zookeeper_multi::testNestedCreate : elapsed 1009 : OK
Zookeeper_multi::testSetData : elapsed 6019 : OK
Zookeeper_multi::testUpdateConflict : elapsed 1014 : OK
Zookeeper_multi::testDeleteUpdateConflict : elapsed 1007 : OK
Zookeeper_multi::testAsyncMulti : elapsed 2001 : OK
Zookeeper_multi::testMultiFail : elapsed 1006 : OK
Zookeeper_multi::testCheck : elapsed 1020 : OK
Zookeeper_multi::testWatch : elapsed 2013 : OK
Zookeeper_watchers::testDefaultSessionWatcher1zktest-mt: tests/ZKMocks.cc:271: SyncedBoolCondition DeliverWatchersWrapper::isDelivered() const: Assertion `i<1000' failed.
Aborted (core dumped)

It would appear that the zookeeper connection does not transition to connected within the required time; I increased the time allowed but no change.

Ubuntu raring has glibc 2.17; the test suite works fine on previous Ubuntu releases and this is the only difference that stood out.

Interestingly the cli_mt worked just fine connecting to the same zookeeper instance that the tests left lying around so I'm assuming this is a test error rather than an actual bug.
312414 No Perforce job exists for this issue. 1 312760
6 years, 2 weeks ago
Reviewed
0|i1hwwv:
ZooKeeper ZOOKEEPER-1645

ZooKeeper OSGi package imports not complete

Bug Closed Major Fixed Arnoud Glimmerveen Arnoud Glimmerveen Arnoud Glimmerveen 12/Feb/13 02:08   13/Mar/14 14:16 15/Feb/13 19:50 3.4.6, 3.5.0 3.4.6, 3.5.0     0 5   The ZooKeeper bundle relies on three packages it currently does not declare in the Import-Package MANIFEST header: {{javax.security.auth.callback}} , {{javax.security.auth.login}} and {{javax.security.sasl}} . By adding these the ZooKeeper jar will be a valid OSGi bundle. 312388 No Perforce job exists for this issue. 1 312734
6 years, 2 weeks ago
Reviewed
0|i1hwr3:
ZooKeeper ZOOKEEPER-1644

Add support for compressed SetWatches packet

Improvement Open Major Unresolved Unassigned Thawan Kooburat Thawan Kooburat 11/Feb/13 17:13   15/Feb/13 19:42       c client, java client, server   0 3   On reconnect with a server to restore its session, a client have to send all watched paths via SetWatches packet to the server. This can be potentially large and exceeded server-side buffer (jute.maxbuffer) causing the session to fail. We have 2 concerns.

1. We can increase jute.maxbuffer to arbitrarily size as a simple workaround, but, in our use case, the number of watch is going to keep growing

2. If a large number of clients get disconnected at once, the server may receive a large amount data over network because of the flood of SetWatches packet.

In our case, the watch paths should by highly compressible. So our current plan is to add a new type of request which is a compressed set watch request. It should be possible to support multiple compression schemes. We are probably going to use snappy compression but may add support for gzip as a default to minimize external dependency requirement.

Feel free to comment if you have any suggestion.
312325 No Perforce job exists for this issue. 0 312671
7 years, 5 weeks, 5 days ago 0|i1hwd3:
ZooKeeper ZOOKEEPER-1643

Windows: fetch_and_add not 64bit-compatible, may not be correct

Bug Resolved Major Fixed Erik Anderson Erik Anderson Erik Anderson 08/Feb/13 20:44   01/Aug/17 11:42 20/Feb/13 00:26 3.3.3 3.5.0, 3.4.11 c client   0 6   Windows 7
Microsoft Visual Studio 2005
Note: While I am using a really old version of ZK, I did do enough "SVN Blame" operations to realize that this code hasn't changed.

I am currently attempting to compile the C client under MSVC 2005 arch=x64. There are three things I can see with fetch_and_add() inside of /src/c/src/mt_adapter.c

(1) MSVC 2005 64bit will not compile inline _asm sections. I'm moderately sure this code is x86-specific so I'm unsure whether it should attempt to either.

(2) The Windows intrinsic InterlockedExchangeAdd [http://msdn.microsoft.com/en-us/library/windows/desktop/ms683597(v=vs.85).aspx] appears to do the same thing this code is attempting to do

(3) I'm really rusty on my assembly, but why are we doing two separate XADD operations here, and is the code as-written anything approaching atomicity?

If you want an official patch I likely can do an SVN checkout and submit a patch the replaces the entire #else on lines 495-505 with a "return InterlockedExchangeAdd(operand, incr);"

Usually when I'm scratching my head this badly there's something I'm missing though. As far as I can tell there has been no prior discussion on this code.
312042 No Perforce job exists for this issue. 1 312388
2 years, 33 weeks, 2 days ago 0|i1hum7:
ZooKeeper ZOOKEEPER-1642

Leader loading database twice

Bug Closed Major Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 08/Feb/13 05:30   13/Mar/14 14:17 16/May/13 13:33   3.4.6, 3.5.0     0 7   The leader server currently loads the database before running leader election when trying to figure out the zxid it needs to use for the election and again when it starts leading. This is problematic for larger databases so we should remove the redundant load if possible.

The code references are:

# getLastLoggedZxid() in QuorumPeer;
# loadData() in ZooKeeperServer.
311917 No Perforce job exists for this issue. 2 312263
6 years, 2 weeks ago 0|i1htun:
ZooKeeper ZOOKEEPER-1641

Using slope=positive results in a jagged ganglia graph of packets rcvd/sent

Bug Resolved Minor Fixed Ben Hartshorne Ben Hartshorne Ben Hartshorne 06/Feb/13 13:17   16/Feb/13 06:02 15/Feb/13 20:00   3.5.0 contrib   0 3   The ganglia python module uses 'slope=positive' when submitting zk_packets_received and zk_packets_sent. This results in a graph that is jagged (alternating valid results with zeros) at the highest resolution and under-represents the actual value at all averaged resolutions (>1hr).

The module should be changed to calculate the delta in requests and report requests per second instead.
311603 No Perforce job exists for this issue. 1 311949
7 years, 5 weeks, 5 days ago
Reviewed
0|i1hrwv:
ZooKeeper ZOOKEEPER-1640

dynamically load command objects in zk

Improvement Resolved Minor Not A Problem Tian Hong Wang Tian Hong Wang Tian Hong Wang 05/Feb/13 04:21   26/Feb/13 03:49 26/Feb/13 03:49   3.4.5 java client   0 3   In class org.apache.zookeeper.ZooKeeperMain.java,
new CloseCommand().addToMap(commandMapCli);
new CreateCommand().addToMap(commandMapCli);
new DeleteCommand().addToMap(commandMapCli);
new DeleteAllCommand().addToMap(commandMapCli);
// Depricated: rmr
new DeleteAllCommand("rmr").addToMap(commandMapCli);
new SetCommand().addToMap(commandMapCli);
new GetCommand().addToMap(commandMapCli);
new LsCommand().addToMap(commandMapCli);
new Ls2Command().addToMap(commandMapCli);
new GetAclCommand().addToMap(commandMapCli);
new SetAclCommand().addToMap(commandMapCli);
new StatCommand().addToMap(commandMapCli);
new SyncCommand().addToMap(commandMapCli);
new SetQuotaCommand().addToMap(commandMapCli);
new ListQuotaCommand().addToMap(commandMapCli);
new DelQuotaCommand().addToMap(commandMapCli);
new AddAuthCommand().addToMap(commandMapCli);

The above code is not flexible for command object scalability. It's better to refine the code to load and create the command objects dynamically.
patch 311332 No Perforce job exists for this issue. 1 311678
7 years, 4 weeks, 2 days ago 0|i1hq8n:
ZooKeeper ZOOKEEPER-1639

zk.getZKDatabase().deserializeSnapshot adds new system znodes instead of replacing existing ones

Bug Open Major Unresolved Unassigned Alexander Shraer Alexander Shraer 02/Feb/13 22:05   08/Oct/13 16:31   3.4.5       0 3   Before the call to zk.getZKDatabase().deserializeSnapshot in Learner.java,
zk.getZKDatabase().getDataTree().getNode("/zookeeper") == zk.getZKDatabase().getDataTree().procDataNode, which means that this is the same znode, as it should be.

However, after this call, they are not equal. The node actually being used in client operations is zk.getZKDatabase().getDataTree().getNode("/zookeeper"), but the other old node procDataNode is still there and not replaced (in fact it is a final field).
311056 No Perforce job exists for this issue. 0 311401
6 years, 24 weeks, 2 days ago 0|i1hoj3:
ZooKeeper ZOOKEEPER-1638

Redundant zk.getZKDatabase().clear();

Improvement Resolved Trivial Fixed neil bhakta Alexander Shraer Alexander Shraer 02/Feb/13 18:02   12/Mar/14 20:39 12/Mar/14 18:58   3.5.0     0 7   Learner.syncWithLeader calls zk.getZKDatabase().clear() right before zk.getZKDatabase().deserializeSnapshot(leaderIs); Then the first thing deserializeSnapshot does is another clear().

Suggest to remove the clear() in syncWithLeader.

newbie 311044 No Perforce job exists for this issue. 2 311389
6 years, 2 weeks ago 0|i1hogf:
ZooKeeper ZOOKEEPER-1637

Intermittent Segfault with zkpython in pyzoo_exists

Bug Open Major Unresolved Unassigned Robert Schultheis Robert Schultheis 01/Feb/13 15:29   03/Feb/13 02:32   3.4.3, 3.4.4, 3.4.5       0 1   We are getting an intermittent segfault. This is OSX, zookeeper compiled using brew. I've tried 3.4.3 - 3.4.5.

I used GDB to get the following backtrace:

{code}
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
[Switching to process 10366 thread 0x1d03]
0x00007fff8e0984f0 in strlen ()
(gdb) backtrace
#0 0x00007fff8e0984f0 in strlen ()
#1 0x00000001004983cc in prepend_string ()
#2 0x0000000100498451 in Request_path_init ()
#3 0x0000000100499e94 in zoo_awexists ()
#4 0x000000010049a036 in zoo_wexists ()
#5 0x000000010048170b in pyzoo_exists ()
#6 0x000000010008c5d8 in PyEval_EvalFrameEx ()
#7 0x000000010008ecd8 in PyEval_EvalCodeEx ()
#8 0x000000010008ee6c in PyEval_EvalCode ()
#9 0x000000010008be0a in PyEval_EvalFrameEx ()
#10 0x000000010008ecd8 in PyEval_EvalCodeEx ()
#11 0x000000010008ee6c in PyEval_EvalCode ()
#12 0x000000010008be0a in PyEval_EvalFrameEx ()
#13 0x000000010008ecd8 in PyEval_EvalCodeEx ()
#14 0x000000010002cabf in PyClassMethod_New ()
#15 0x000000010000bd32 in PyObject_Call ()
#16 0x000000010008c5ec in PyEval_EvalFrameEx ()
#17 0x000000010008ecd8 in PyEval_EvalCodeEx ()
#18 0x000000010002cabf in PyClassMethod_New ()
#19 0x000000010000bd32 in PyObject_Call ()
#20 0x000000010001a6e9 in PyInstance_New ()
#21 0x000000010000bd32 in PyObject_Call ()
#22 0x0000000100055c5d in _PyObject_SlotCompare ()
#23 0x000000010000bd32 in PyObject_Call ()
#24 0x000000010008bf63 in PyEval_EvalFrameEx ()
#25 0x000000010008ecd8 in PyEval_EvalCodeEx ()
#26 0x000000010008ee6c in PyEval_EvalCode ()
#27 0x000000010008be0a in PyEval_EvalFrameEx ()
#28 0x000000010008edf7 in PyEval_EvalCode ()
#29 0x000000010008be0a in PyEval_EvalFrameEx ()
#30 0x000000010008ecd8 in PyEval_EvalCodeEx ()
#31 0x000000010002cabf in PyClassMethod_New ()
#32 0x000000010000bd32 in PyObject_Call ()
#33 0x000000010001a6e9 in PyInstance_New ()
#34 0x000000010000bd32 in PyObject_Call ()
#35 0x0000000100087c40 in PyEval_CallObjectWithKeywords ()
#36 0x00000001000b940d in initthread ()
#37 0x00007fff8e0448bf in _pthread_start ()
#38 0x00007fff8e047b75 in thread_start ()
{code}

310930 No Perforce job exists for this issue. 0 311275
7 years, 7 weeks, 4 days ago 0|i1hnr3:
ZooKeeper ZOOKEEPER-1636

c-client crash when zoo_amulti failed

Bug Closed Critical Fixed Michael K. Edwards Thawan Kooburat Thawan Kooburat 30/Jan/13 22:54   01/Aug/19 20:50 10/Dec/18 09:29 3.4.3 3.6.0, 3.5.5, 3.4.15 c client   0 5 0 9000   deserialize_response for multi operation don't handle the case where the server fail to send back response. (Eg. when multi packet is too large)

c-client will try to process completion of all sub-request as if the operation is successful and will eventually cause SIGSEGV
100% 100% 9000 0 pull-request-available 310569 No Perforce job exists for this issue. 5 310914
1 year, 14 weeks, 3 days ago 0|i1hljb:
ZooKeeper ZOOKEEPER-1635

ZooKeeper C client doesn't compile on 64 bit Windows

Improvement Resolved Major Invalid Unassigned Tomas Gutierrez Tomas Gutierrez 30/Jan/13 16:02   24/Apr/14 17:15 23/Apr/14 18:02   3.5.0     10 8   Windows x64 systems. x64 target does not support _asm inline (See: http://msdn.microsoft.com/en-us/library/4ks26t93(v=vs.80).aspx)

The proposal is to use native windows function which still valid for i386 and x64 architecture.

In order to avoid any potential break, a compilation directive has been added. But, the best should be the removal of the asm part.


-----------
sample code
-----------


int32_t fetch_and_add(volatile int32_t* operand, int incr)
{
#ifndef WIN32
int32_t result;
asm __volatile__(
"lock xaddl %0,%1\n"
: "=r"(result), "=m"(*(int *)operand)
: "0"(incr)
: "memory");
return result;
#else

#ifdef WIN32_NOASM
InterlockedExchangeAdd(operand, incr);
return *operand;
#else
volatile int32_t result;
_asm
{
mov eax, operand; //eax = v;
mov ebx, incr; // ebx = i;
mov ecx, 0x0; // ecx = 0;
lock xadd dword ptr [eax], ecx;
lock xadd dword ptr [eax], ebx;
mov result, ecx; // result = ebx;
}
return result;*/
#endif

#endif
}
310494 No Perforce job exists for this issue. 0 310839
5 years, 48 weeks ago 0|i1hl2n:
ZooKeeper ZOOKEEPER-1634

A new feature proposal to ZooKeeper: authentication enforcement

New Feature Resolved Major Fixed Michael Han Jaewoong Choi Jaewoong Choi 30/Jan/13 14:35   26/Sep/19 19:44 24/Jul/19 12:01 3.4.5 3.6.0 security, server   4 14 259200 243000 16200 6% Up to the version of 3.4.5, ZooKeeperServer doesn't force the authentication if the client doesn't give any auth-info through ZooKeeper#addAuthInfo method invocation. Hence, every znode should have at least one ACL assigned otherwise any unauthenticated client can do anything on it.

The current authentication/authorization mechanism of ZooKeeper described above has several points at issue:
1. At security standpoint, a maleficent client can access a znode which doesn't have any proper authorization access control set.
2. At runtime performance standpoint, authorization for every znode to every operation is unnecessarily but always evaluated against the client who bypassed the authentication phase.

In other words, the current mechanism doesn't address a certain requirement at below:
"We want to protect a ZK server by enforcing a simple authentication to every client no matter which znode it is trying to access. Every connection (or operation) from the client won't be established but rejected if it doesn't come with a valid authentication information. As we don't have any other distinction between znodes in term of authorization, we don't want any ACLs on any znode."

To address the issues mentioned above, we propose a feature called "authentication enforcement" to the ZK source. The idea is roughly but clearly described in a form of patch in the attached file (zookeeper_3.4.5_patch_for_authentication_enforcement.patch): which makes ZooKeeperServer enforce the authentication with the given 2 configurations: authenticationEnforced (boolean) and enforcedAuthenticationScheme (string) against every operation coming through ZooKeeperServer#processPacket method except for OpCode.auth operation. The repository base of the patch is "http://svn.apache.org/repos/asf/zookeeper/tags/release-3.4.5/"
6% 6% 16200 243000 259200 pull-request-available 310474 No Perforce job exists for this issue. 1 310819
34 weeks ago authentication 0|i1hky7:
ZooKeeper ZOOKEEPER-1633

Introduce a protocol version to connection initiation message

Bug Closed Major Fixed Alexander Shraer Alexander Shraer Alexander Shraer 30/Jan/13 14:14   13/Mar/14 14:16 02/Apr/13 02:32   3.4.6 server   0 6   Currently the first message a server sends to another server includes just one field - the server's id (long). This is in QuorumCnxManager.java. This makes changes to the information passed during this initial connection very difficult. This patch will change the first field of the message to be a protocol version (a negative number that can't be a server id). The second field will be the server id. The third field is number of bytes in the remainder of the message. A 3.4 server will read the first field as before, but if this is a negative number it will read the second field to find the server id, and then remove the remainder of the message from the stream. This will not affect 3.4 since 3.4 and earlier servers send just the server id (so the code in the patch will not run unless there is a server > 3.4 trying to connect). This will, however, provide the necessary flexibility for future releases as well as an upgrade path from 3.4 310464 No Perforce job exists for this issue. 5 310809
6 years, 2 weeks ago 0|i1hkvz:
ZooKeeper ZOOKEEPER-1632

fix memory leaks in cli_st

Bug Closed Minor Fixed Flavio Paiva Junqueira Colin McCabe Colin McCabe 29/Jan/13 17:49   13/Mar/14 14:17 04/Dec/13 05:29   3.4.6, 3.5.0 c client   0 6   Fix two memory leaks revealed by running:
{code}
valgrind --leak-check=full ./.libs/cli_st 127.0.0.1:2182
create /foo
quit
{code}
310304 No Perforce job exists for this issue. 4 310649
6 years, 2 weeks ago The fix for this issue solves the memory leak spotted in the absence of errors. In the case the completion function is not registered because of an error (e.g., see zoo_async), the line duplicate won't be freed. 0|i1hjwf:
ZooKeeper ZOOKEEPER-1631

cppunit test TestOperations.cc fails

Bug Open Minor Unresolved Unassigned Colin McCabe Colin McCabe 29/Jan/13 14:19   29/Jan/13 14:19   3.4.6       0 1   I tried running "make run-check" on the cppunit tests, and got the following error:

{code}
tests/TestOperations.cc:270: Assertion: equality assertion failed [Expected: 1, Actual : 0]
tests/TestOperations.cc:339: Assertion: assertion failed [Expression: timeMock==zh->last_recv]
tests/TestOperations.cc:407: Assertion: equality assertion failed [Expected: 1, Actual : 0]
tests/TestOperations.cc:212: Assertion: equality assertion failed [Expected: -7, Actual : 0]
{code}

I thought this might be an environment issue, but I was able to reproduce it on both Ubuntu 12.04 and OpenSUSE 12.1
310250 No Perforce job exists for this issue. 0 310595
7 years, 8 weeks, 2 days ago 0|i1hjkf:
ZooKeeper ZOOKEEPER-1630

collect the zk connects/disconnects every cycle and report it to controller.

Improvement Resolved Major Invalid Unassigned kishore gopalakrishna kishore gopalakrishna 25/Jan/13 19:25   25/Jan/13 19:28 25/Jan/13 19:28         0 1   Helix agent must collect the zk connects/disconnects and use the health check framework to convey the information. COntroller must disable the nodes are connecting/disconnecting frequently 309684 No Perforce job exists for this issue. 0 302835
7 years, 8 weeks, 6 days ago 0|i1g7nz:
ZooKeeper ZOOKEEPER-1629

testTransactionLogCorruption occasionally fails

Bug Closed Major Fixed Alexander Shraer Alexander Shraer Alexander Shraer 24/Jan/13 20:10   13/Mar/14 14:17 14/Jul/13 22:07   3.4.6, 3.5.0 tests   0 10   It seems that testTransactionLogCorruption is very flaky,for example fails here:

https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/
https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/
https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink

also fails for older builds (no longer on the website), for example all builds from 381 to 399.
309042 No Perforce job exists for this issue. 6 289702
6 years, 2 weeks ago 0|i1dylj:
ZooKeeper ZOOKEEPER-1628

Documented list of allowable characters in ZK doc not in line with code

Bug Resolved Major Fixed Gabriel Reid Gabriel Reid Gabriel Reid 24/Jan/13 09:19   25/Jan/13 02:09 25/Jan/13 02:09   3.5.0 documentation, java client   0 2   The documented set of allowable characters in ZooKeeper node names in the Programmer's Guide is not entirely in line with the code.

The range of non-printable ASCII characters in the doc ends too early (i.e. 0x19 instead of 0x1F).

The range checking code in PathUtils also includes off-by-one errors, so that characters that are on the border of being unallowable are actually allowed by the code.
308805 No Perforce job exists for this issue. 1 284551
7 years, 8 weeks, 6 days ago
Reviewed
0|i1d3mn:
ZooKeeper ZOOKEEPER-1627

Add org.apache.zookeeper.common to exported packages in OSGi MANIFEST headers

Improvement Closed Major Fixed Arnoud Glimmerveen Arnoud Glimmerveen Arnoud Glimmerveen 24/Jan/13 02:53   13/Mar/14 14:17 09/Oct/13 18:14 3.4.5 3.4.6, 3.5.0     4 5   Java: 1.6.0_31
OSGi environment: Karaf 2.3.0
The utilities contained in the org.apache.zookeeper.common package are not part of the exported packages in an OSGi environment, thus making them not available to other bundles using ZooKeeper.
Propose to add the org.apache.zookeeper.common package to the Export-Package MANIFEST header.
308637 No Perforce job exists for this issue. 2 282489
6 years, 2 weeks ago
Reviewed
0|i1cqwf:
ZooKeeper ZOOKEEPER-1626

ZOOKEEPER-1366 Zookeeper C client should be tolerant of clock adjustments

Sub-task Resolved Major Fixed Colin McCabe Colin McCabe Colin McCabe 21/Jan/13 14:10   28/Aug/17 07:30 20/Jun/15 19:58   3.5.1, 3.6.0 c client   1 13   The Zookeeper C client should use monotonic time when available, in order to be more tolerant of time adjustments. 307051 No Perforce job exists for this issue. 7 267152
2 years, 29 weeks, 3 days ago 0|i1a48f:
ZooKeeper ZOOKEEPER-1625

zkServer.sh is looking for clientPort in config file, but it may no longer be there with ZK-1411

Bug Resolved Major Fixed Alexander Shraer Alexander Shraer Alexander Shraer 19/Jan/13 16:33   23/Jan/13 14:24 22/Jan/13 21:56 3.5.0 3.5.0 scripts   0 4   zkServer.sh is currently looking for "clientPort" entry in the static configuration file and uses it to contact the server.

With ZOOKEEPER-1411 clientPort is part of the dynamic configuration, and may appear in the separate dynamic configuration file. The "clientPort" entry may no longer be in the static config file.

With the proposed patch zkServer.sh first looks in the old (static) config file, then if clientPort is not there, it figures out the id of the server by looking at myid file, and then using that id finds the client port in the dynamic config file.
305583 No Perforce job exists for this issue. 1 257309
7 years, 9 weeks, 1 day ago 0|i18fh3:
ZooKeeper ZOOKEEPER-1624

PrepRequestProcessor abort multi-operation incorrectly

Bug Closed Critical Fixed Thawan Kooburat Thawan Kooburat Thawan Kooburat 17/Jan/13 22:11   13/Mar/14 14:16 10/Oct/13 15:06   3.4.6, 3.5.0 server   0 6   We found this issue when trying to issue multiple instances of the following multi-op concurrently

multi {
1. create sequential node /a-
2. create node /b
}

The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist

However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible.

Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously

1. ZOK, ZOK
2. ZOK, ZNODEEXISTS,
3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY,

When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor

1. create /a-0001
2. create /a-0003
3. create /a-0001

The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node ("/"). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op

The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not.
zk-review 305000 No Perforce job exists for this issue. 6 254379
6 years, 2 weeks ago 0|i17xdz:
ZooKeeper ZOOKEEPER-1623

Authentication using SASL

Bug Open Major Unresolved Unassigned Christian Wuertz Christian Wuertz 16/Jan/13 11:46   22/Oct/13 14:39   3.4.5       1 3   First of all, I'm just running some test and thus I don't wan't/need any authentication at all. So I didn't configured any. But running my Java client with an Oracle JVM (1.6.38) I run into the following problem:

`2013-01-16 17:40:30,659 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=192.168.2.28:2181 sessionTimeout=5000 watcher=master.Master@eb42cbf
2013-01-16 17:40:30,674 [main] DEBUG org.apache.zookeeper.ClientCnxn - zookeeper.disableAutoWatchReset is false
2013-01-16 17:40:30,698 [Thread-0] DEBUG master.Master - Master waits...
2013-01-16 17:40:30,701 [main-SendThread(Teots-PC:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server Teots-PC/192.168.2.28:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-01-16 17:40:30,706 [main-SendThread(Teots-PC:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to Teots-PC/192.168.2.28:2181, initiating session
2013-01-16 17:40:30,708 [main-SendThread(Teots-PC:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Session establishment request sent on Teots-PC/192.168.2.28:2181
2013-01-16 17:40:30,709 [main-SendThread(Teots-PC:2181)] DEBUG org.apache.zookeeper.client.ZooKeeperSaslClient - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
2013-01-16 17:40:30,730 [main-SendThread(Teots-PC:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server Teots-PC/192.168.2.28:2181, sessionid = 0x13c44254fd70003, negotiated timeout = 5000
2013-01-16 17:40:30,732 [main-EventThread] DEBUG master.Master - Master recieved an event: None
2013-01-16 17:40:30,732 [main-SendThread(Teots-PC:2181)] DEBUG org.apache.zookeeper.client.ZooKeeperSaslClient - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
2013-01-16 17:40:30,732 [main-EventThread] DEBUG master.Master - Master's state: SyncConnected
2013-01-16 17:40:30,732 [main-SendThread(Teots-PC:2181)] DEBUG org.apache.zookeeper.client.ZooKeeperSaslClient - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration`

This does not happen with an OpenJDK JVM.
304707 No Perforce job exists for this issue. 0 254068
6 years, 22 weeks, 2 days ago 0|i17vh3:
ZooKeeper ZOOKEEPER-1622

session ids will be negative in the year 2022

Bug Closed Trivial Fixed Eric C. Newton Eric C. Newton Eric C. Newton 16/Jan/13 11:29   13/Mar/14 14:16 16/Dec/13 01:30 3.4.0, 3.5.0 3.4.6, 3.5.0     0 5   Someone decided to use a large number for their myid file. This cause session ids to go negative, and our software (Apache Accumulo) did not handle this very well. While diagnosing the problem, I noticed this in SessionImpl:

{noformat}
public static long initializeNextSession(long id) {
long nextSid = 0;
nextSid = (System.currentTimeMillis() << 24) >> 8;
nextSid = nextSid | (id <<56);
return nextSid;
}
{noformat}

When the 40th bit in System.currentTimeMillis() is a one, sign extension will fill the upper 8 bytes of nextSid, and id will not make the session id unique. I recommend changing the right shift to the logical shift:

{noformat}
public static long initializeNextSession(long id) {
long nextSid = 0;
nextSid = (System.currentTimeMillis() << 24) >>> 8;
nextSid = nextSid | (id <<56);
return nextSid;
}
{noformat}

But, we have until the year 2022 before we have to worry about it.
304699 No Perforce job exists for this issue. 1 253877
6 years, 2 weeks ago
Reviewed
0|i17uan:
ZooKeeper ZOOKEEPER-1621

ZooKeeper does not recover from crash when disk was full

Bug Patch Available Major Unresolved Michi Mutsuzaki David Arthur David Arthur 16/Jan/13 10:24   05/Feb/20 07:11   3.4.3 3.7.0, 3.5.8 server   7 26 0 7200   Ubuntu 12.04, Amazon EC2 instance The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception

2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:282)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306)
at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)

Then many subsequent exceptions like:

2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial.
2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259)
at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386)
at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138)
at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112)
at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)


It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case?

100% 100% 7200 0 pull-request-available 304627 No Perforce job exists for this issue. 3 252823
1 year, 13 weeks, 3 days ago 0|i17nsf:
ZooKeeper ZOOKEEPER-1620

NIOServerCnxnFactory (new code introduced in ZK-1504) opens selectors but never closes them

Bug Resolved Major Fixed Thawan Kooburat Alexander Shraer Alexander Shraer 14/Jan/13 23:18   01/May/13 22:30 25/Jan/13 01:47 3.5.0 3.5.0 server   0 4   New code (committed in ZK-1504) opens selectors but doesn't close them.
Specifically AbstractSelectThread in its constructor does

this.selector = Selector.open();

But possibly also elsewhere. Tests fail for me with the following message:

java.io.IOException: Too many open files
at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:69)
at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:52)
at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18)
at java.nio.channels.Selector.open(Selector.java:209)
at org.apache.zookeeper.server.NIOServerCnxnFactory$AbstractSelectThread.<init>(NIOServerCnxnFactory.java:128)
at org.apache.zookeeper.server.NIOServerCnxnFactory$AcceptThread.<init>(NIOServerCnxnFactory.java:177)
at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:663)
at org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:127)
at org.apache.zookeeper.server.quorum.QuorumPeer.<init>(QuorumPeer.java:709)
at org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:177)
at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:113)
at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:71)
at org.apache.zookeeper.test.ReconfigTest.setUp(ReconfigTest.java:56)
304375 No Perforce job exists for this issue. 2 252551
7 years, 8 weeks, 6 days ago
Reviewed
0|i17m3z:
ZooKeeper ZOOKEEPER-1619

Allow spaces in URL

Improvement Resolved Minor Fixed Edward Ribeiro Todd Nine Todd Nine 11/Jan/13 10:57   25/Jan/13 01:55 25/Jan/13 01:55 3.4.5, 3.5.0 3.5.0 java client   0 3   Currently, spaces are not allowed in the url. This format will work.

{code}
10.10.1.1:2181,10.10.1.2:2181/usergrid
{code}

This format will not (notice the spaces around the comma)

{code}
10.10.1.1:2181 , 10.10.1.2:2181/usergrid
{code}

Please add a trim to both the port and the hostname parsing.
303965 No Perforce job exists for this issue. 2 251750
7 years, 8 weeks, 6 days ago
Reviewed
0|i17h5z:
ZooKeeper ZOOKEEPER-1618

Disconnected event when stopping leader process

Improvement Open Minor Unresolved Unassigned Peter Nerg Peter Nerg 09/Jan/13 06:15   26/Feb/13 05:35   3.4.4, 3.4.5   documentation   1 4   Linux SLES
java version "1.6.0_31"
Running a three node ZK cluster I stop/kill the leader node.
Immediately all connected clients will receive a Disconnected event, a second or so later an event with SyncConnected is received.
Killing a follower will not produce the same issue/event.

The application/clients have been implemented to manage Disconnected events so they survive.
I however expected the ZK client to manage the hickup during the election process.
This produces quite a lot of logging in large clusters that have many services relying on ZK.
In some cases we may loose a few requests as we need a working ZK cluster to execute those requests.

IMHO it's not really full high availability if the ZK cluster momentarily takes a dive due to that the leader goes away.
No matter how much redundancy one uses in form of ZK instances one still may get processing errors during leader election.

I've verified this behavior in both 3.4.4 and 3.4.5
303363 No Perforce job exists for this issue. 0 250703
7 years, 4 weeks, 2 days ago 0|i17apj:
ZooKeeper ZOOKEEPER-1617

zookeeper version error log info ?

Bug Open Major Unresolved Unassigned wangwei wangwei 09/Jan/13 05:46   22/Jan/13 17:40           0 2   2012-12-31 10:51:41,562-[TS] INFO main-EventThread org.I0Itec.zkclient.ZkClient - zookeeper state changed (Disconnected)
2012-12-31 10:51:43,008-[TS] INFO main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxn - Opening socket connection to server /17.22.17.1:2181. Will not attempt to authenticate using SASL (unknown error)
2012-12-31 10:51:43,009-[TS] INFO main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxn - Socket connection established to /17.22.17.1:2181, initiating session
2012-12-31 10:51:43,011-[TS] WARN main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
2012-12-31 10:51:43,011-[TS] INFO main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxn - Session establishment complete on server /17.22.17.1:2181, sessionid = 0x13b8a23254100be, negotiated timeout = 6000
2012-12-31 10:51:43,012-[TS] INFO main-EventThread org.I0Itec.zkclient.ZkClient - zookeeper state changed (SyncConnected)
2012-12-31 10:51:47,012-[TS] INFO main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 4002ms for sessionid 0x13b8a23254100be, closing socket connection and attempting reconnect


zookeeper client is 3.4.4
zookeeper server is 3.3.4

user 3.4.4 client connection 3.3.4 server
303359 No Perforce job exists for this issue. 0 250699
7 years, 9 weeks, 2 days ago 0|i17aon:
ZooKeeper ZOOKEEPER-1616

time calculations should use a monotonic clock

Bug Resolved Major Duplicate Unassigned Todd Lipcon Todd Lipcon 08/Jan/13 19:35   11/Apr/15 17:44 11/Apr/15 17:44         0 9   We recently had an issue with ZooKeeper sessions acting strangely due to a bad NTP setup on a set of hosts. Looking at the code, ZK seems to use System.currentTimeMillis to measure durations or intervals in many places. This is bad since that time can move backwards or skip ahead by several minutes. Instead, it should use System.nanoTime (or a wrapper such as Guava's Stopwatch class) 303295 No Perforce job exists for this issue. 0 250468
4 years, 49 weeks, 5 days ago 0|i1799b:
ZooKeeper ZOOKEEPER-1615

minor typos in ZooKeeper Programmer's Guide web page

Improvement Closed Trivial Fixed Evan Zacks Evan Zacks Evan Zacks 07/Jan/13 15:51   13/Mar/14 14:17 25/Jan/13 02:17 3.4.5 3.4.6, 3.5.0 documentation   0 3   There are some minor typos and misspellings in the Programmer's Guide web page. documentation 303005 No Perforce job exists for this issue. 1 250118
6 years, 2 weeks ago
Reviewed
0|i1773j:
ZooKeeper ZOOKEEPER-1614

zoo_multi c MT client windows crash

Bug Open Major Unresolved Unassigned Richard Dermer Richard Dermer 02/Jan/13 18:36   02/Jan/13 18:42   3.4.5   c client   0 1   Windows C MT client The windows C MultiThreaded client crashes when usng the zoo_multi APis. The underlying is that the mutex and condition variables need to be initialized with pthread_cond_init and pthread_mutex_init.

Attached are the files I've modified to make this work. In the modified files I've added a "multi" command to cli that when Cli.exe (mt build) is run on window's without the rest of the fixes will crash.
302299 No Perforce job exists for this issue. 1 248966
7 years, 12 weeks, 1 day ago 0|i16zzr:
ZooKeeper ZOOKEEPER-1613

The documentation still points to 2008 in the copyright notice

Bug Closed Trivial Fixed Edward Ribeiro Edward Ribeiro Edward Ribeiro 30/Dec/12 18:47   13/Mar/14 14:16 25/Jan/13 02:29 3.4.5 3.3.7, 3.4.6, 3.5.0 documentation 30/Dec/12 0 3   While fiddling with docbook to solve the broken links of ZOOKEEPER-1488 I noted that all the documentation's copyright notice still has the year 2008 only. I am submitting a patch a fix this. newbie 302080 No Perforce job exists for this issue. 1 248721
6 years, 2 weeks ago
Reviewed
docs 0|i16yhb:
ZooKeeper ZOOKEEPER-1612

Zookeeper unable to recover and start once datadir disk is full and disk space cleared

Bug Resolved Major Duplicate Unassigned suja s suja s 27/Dec/12 01:57   16/Jan/13 13:37 16/Jan/13 13:37 3.4.3       0 3   Once zookeeper data dir disk becomes full, the process gets shut down.
{noformat}
2012-12-14 13:22:26,959 [myid:2] - ERROR [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@276] - Severe unrecoverable error, exiting
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:282)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:56)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:119)
at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:168)
at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1115)
at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130)
at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130)
at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1179)
at org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:138)
at org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:213)
at org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:230)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:242)
at org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:274)
at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:407)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:759)
{noformat}

Later disk space is cleared and zk started again. Startup of zk fails as it is not able to read snapshot properly. (Since load from disk failed it is not able to join peers in the quorum and get a snapshot diff)
{noformat}

2012-12-14 16:20:31,489 [myid:2] - INFO [main:FileSnap@83] - Reading snapshot ../dataDir/version-2/snapshot.1000000042
2012-12-14 16:20:31,564 [myid:2] - ERROR [main:QuorumPeer@472] - Unable to load database on disk
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:436)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:428)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:152)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
2012-12-14 16:20:31,566 [myid:2] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:473)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:428)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:152)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132)

{noformat}


301854 No Perforce job exists for this issue. 0 248477
7 years, 10 weeks, 1 day ago 0|i16wzr:
ZooKeeper ZOOKEEPER-1611

cbcfhbf vfgbfb

Bug Resolved Major Invalid Unassigned prabhu sharma prabhu sharma 27/Dec/12 01:23   27/Dec/12 03:57 27/Dec/12 03:57         0 2   301852 No Perforce job exists for this issue. 0 248474
7 years, 13 weeks ago 0|i16wz3:
ZooKeeper ZOOKEEPER-1610

Some classes are using == or != to compare Long/String objects instead of .equals()

Bug Closed Critical Fixed Edward Ribeiro Edward Ribeiro Edward Ribeiro 26/Dec/12 12:31   13/Mar/14 14:17 11/Oct/13 15:19 3.4.5, 3.5.0 3.4.6, 3.5.0 java client, quorum 26/Dec/12 0 4   The classes org.apache.zookeeper.client.ZooKeeperSaslClient.java and
org.apache.zookeeper.server.quorum.flexible.QuorumHierarchical.java compare Strings and/or Longs using referential equality.

Usually, this is not a problem because the Longs are cached and Strings are interned, but I myself had problems with those kind of comparisons in the past because one production JVM didn't reused the objects.
301818 No Perforce job exists for this issue. 2 248439
6 years, 2 weeks ago
Reviewed
0|i16wrb:
ZooKeeper ZOOKEEPER-1609

Improve ZooKeeper performance under mixed workload

Improvement Resolved Major Duplicate Unassigned Thawan Kooburat Thawan Kooburat 22/Dec/12 15:35   07/Apr/17 18:51 07/Apr/17 18:51 3.4.3   server   1 5   ZOOKEEPER-1505 allows 1 write or N reads to pass through the CommitProcessor at any given time. I did performance experiment similar to http://wiki.apache.org/hadoop/ZooKeeper/Performance and found that read throughput drop dramatically when there are write requests. After a bit more investigation, I found that
the biggest bottleneck is at the request queue entering the CommitProcessor.

When the CommitProcessor see any write request, it will need to block the entire pipeline and wait until matching commit from the leader. This means that all read requests (including ping request) won't be able to go through. The time spent waiting for commit from the leader far exceed the time spent waiting for 1 write to goes through the CommitProcessor.

The current plan is to create multiple request queues at the front of the CommitProcessor. Requests are hashed using sessionId and send to one of the queue. Whenever, the CommitProcessor saw a write request on one of the queue it moves on to process read requests. It will have to unblock the write requests in the same order that it sent to the leader, so it may need to maintain a separate list to keep track of that.

The correctness is the same as having more learners in the ensemble. Sessions which are hashed onto a different queue is similar to sessions connecting to a different learners in the ensemble.

I am hoping that this will improve read throughput and reduce disconnect rate on an ensemble with large number of clients

301660 No Perforce job exists for this issue. 0 248211
2 years, 49 weeks, 6 days ago 0|i16vcn:
ZooKeeper ZOOKEEPER-1608

Add support for key-value store as optional storage engine

Improvement Open Major Unresolved Unassigned Thawan Kooburat Thawan Kooburat 22/Dec/12 00:29   24/Jan/13 14:22   3.4.3   server   1 6   Problem:
1. ZooKeeper need to load the entire dataset into its memory. So the total data size and number of znode are limited by the amount of available memory.
2. We want to minimize ZooKeeper down time, but found that it is bound by snapshot loading and writing time. The bigger the database, the longer it take for the system to recover. The worst case is that if the data size grow too large and initLimit wasn't update accordingly, the quorum won't form after failure.

Implementation: (still work in progress)

1. Create a new type of DataTree that supported key-value storage as backing store. Our current candidate backing store is Oracle's Berkeley DB Java Edition

2. There is no need to use snapshot facility for this type of DataTree. Since doing a sync write of lastProcessedZxid into the backing store is the same as taking a snapshot. However, the system still use txnlog as before. The system can be considered as having only a single snapshot. It has to rely on backing store to detect data corruption and recovery.

3. There is no need to do any per-node locking. CommitProcessor (ZOOKEEPER-1505) prevents concurrent read and write to reach the DataTree. The DataTree is also accessed by PrepRequestProcessor (to create ChangeRecord), but I believe that read and write to the same znode cannot happens concurrently.

4. There are 3 types of data which is required to be persisted in backing store: ACLs, znodes and sessions. However, we also store other data reduce oDataTree initialization time or serialization cost such as list of node's children and list of ephemeral node.

5. Each Zookeeper's txn may translate into multiple actions on the DataTree. For example, creating a node may result in AddingZNODE, AddingChildren and AddingEphemeralNode. However, as a long as these operations are idempotent, there is no need to group them into a transaction. So txns can be replayed on DataTree without corrupting the data. This also means that the system don't need key-value store that support transaction semantic. Currently, only operations related to quota break this assumption because it use increment operation.

6. SNAP protocol is supported so the ensemble can be upgraded online. In the future we may add extend SNAP protocol to send raw data file in order to save CPU cost when sending large database.
301632 No Perforce job exists for this issue. 0 248175
7 years, 9 weeks ago 0|i16v4n:
ZooKeeper ZOOKEEPER-1607

Read-only Observer

Improvement Patch Available Major Unresolved Raúl Gutiérrez Segalés Thawan Kooburat Thawan Kooburat 21/Dec/12 23:40   14/Dec/19 06:06   3.4.3 3.7.0 server   1 8   This feature reused some of the mechanism already provided by ReadOnlyZooKeeper (ZOOKEEPER-704) but implemented in a different way

Goal: read-only clients should be able to connect to the observer or continue to read data from the observer event when there is an outage of underling quorum. This means that it is possible for the observer to provide 100% read uptime for read-only local session (ZOOKEEPER-1147)

Implementation:
The observer don't tear down itself when it lose connection with the leader. It only close the connection associated with non read-only sessions and global sessions. So the client can try other observer if this is a temporal failure.

During the outage, the observer switch to read-only mode. All the pending and future write requests get will get NOT_READONLY error. Read-only state transition is sent to all session on that observer. The observer only accepts a new connection from a read-only client.

When the observer is able to reconnect to the leader. It sends state transition (CONNECTED_STATE) to all current session. If it is able to synchronize with the leader using DIFF, the steam of txns is sent through the commit processor instead of applying to the DataTree directly to prevent raise condition between in-flight read requests (see ZOOKEEPER-1505). The client will receive watch events correctly and can start issuing write requests.

However, if the observer is getting the snapshot. It need to drop all the connection since it cannot fire a watch correctly.
301631 No Perforce job exists for this issue. 1 248174
5 years, 50 weeks ago 0|i16v4f:
ZooKeeper ZOOKEEPER-1606

intermittent failures in ZkDatabaseCorruptionTest on jenkins

Bug Closed Major Fixed lixiaofeng Patrick D. Hunt Patrick D. Hunt 21/Dec/12 17:19   13/Mar/14 14:17 19/Feb/13 03:19 3.4.5, 3.5.0 3.4.6, 3.5.0 tests   0 4   ZkDatabaseCorruptionTest is failing intermittently on jenkins with:

"Error Message: the last server is not the leader"

Seeing this on jdk7/openjdk7/solaris - 3 times in the last month.

https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-openjdk7/2/testReport/junit/org.apache.zookeeper.test/ZkDatabaseCorruptionTest/testCorruption/
newbie, test-patch 301596 No Perforce job exists for this issue. 1 248138
6 years, 2 weeks ago
Reviewed
0|i16uwf:
ZooKeeper ZOOKEEPER-1605

Make RMI port configurable

Improvement Open Major Unresolved Unassigned Joey Echeverria Joey Echeverria 19/Dec/12 07:40   19/Dec/12 13:58   3.4.5   jmx   2 4   JMX uses two ports, the JMX remote port and the RMI server port. The default JMX agent allows you to configure the JMX remote port, via the com.sun.management.jmxremote.port system property, but the RMI server port is randomized at runtime. It's possible to create custom agent that can set the RMI port to a configurable value:

http://olegz.wordpress.com/2009/03/23/jmx-connectivity-through-the-firewall/

Making the RMI port configurable is critical to being able to monitor ZK with JMX through a firewall.
300444 No Perforce job exists for this issue. 0 244477
7 years, 14 weeks, 1 day ago 0|i168av:
ZooKeeper ZOOKEEPER-1604

remove rpm/deb/... packaging

Task Closed Major Fixed Chris Nauroth Patrick D. Hunt Patrick D. Hunt 16/Dec/12 12:53   21/Jul/16 16:18 03/Mar/16 11:14 3.3.0 3.5.2, 3.6.0 build   0 12   Remove rpm/deb/... packaging from our source repo. Now that BigTop is available and fully supporting ZK it's no longer necessary for us to attempt to include this. 298954 No Perforce job exists for this issue. 2 242719
4 years, 1 week, 4 days ago
Reviewed
0|i15xg7:
ZooKeeper ZOOKEEPER-1603

StaticHostProviderTest testUpdateClientMigrateOrNot hangs

Bug Closed Blocker Fixed Flavio Paiva Junqueira Patrick D. Hunt Patrick D. Hunt 16/Dec/12 03:27   13/Mar/14 14:17 26/Sep/13 17:16 3.5.0 3.4.6, 3.5.0 tests   0 5   StaticHostProviderTest method testUpdateClientMigrateOrNot hangs forever.

On my laptop getHostName for 10.10.10.* takes 5+ seconds per call. As a result this method effectively runs forever.

Every time I run this test it hangs. Consistent.
298888 No Perforce job exists for this issue. 4 242560
6 years, 2 weeks ago
Reviewed
0|i15wgv:
ZooKeeper ZOOKEEPER-1602

a change to QuorumPeerConfig's API broke compatibility with HBase

Bug Resolved Blocker Fixed Alexander Shraer Patrick D. Hunt Patrick D. Hunt 14/Dec/12 19:40   16/Dec/12 06:04 16/Dec/12 01:44 3.5.0 3.5.0 server   0 3   The following patch broke an API that's in use by HBase. Otherwise current trunk compiles fine when used by hbase:

bq. ZOOKEEPER-1411. Consolidate membership management, distinguish between static and dynamic configuration parameters (Alex Shraer via breed)

Considering it a blocker even though it's not really a "public" API. If possible we should add back "getServers" method on QuorumPeerConfig to reduce friction for the hbase team.
newbie 298438 No Perforce job exists for this issue. 1 241856
7 years, 14 weeks, 4 days ago
Reviewed
0|i15s4f:
ZooKeeper ZOOKEEPER-1601

document changes for multi-threaded CommitProcessor and NIOServerCnxn

Improvement Resolved Major Fixed Thawan Kooburat Patrick D. Hunt Patrick D. Hunt 12/Dec/12 02:08   25/Jan/13 15:24 25/Jan/13 02:37 3.5.0 3.5.0 documentation   0 1   ZOOKEEPER-1504 and ZOOKEEPER-1505 introduce changes that should be documented - such as new configuration parameters/defaults, etc... We should also verify that nothing else needs to be changed in the documentation related to these changes. 297202 No Perforce job exists for this issue. 2 235196
7 years, 8 weeks, 6 days ago
Reviewed
0|i14n13:
ZooKeeper ZOOKEEPER-1600

Ephemeral node not getting deleted

Bug Resolved Major Not A Problem Patrick D. Hunt Deepa Muthunoori Deepa Muthunoori 10/Dec/12 01:43   15/Feb/13 20:21 15/Feb/13 20:21         0 2   Closing of session is not deleting all the ephemeral nodes.

(Eg: From the log, session Id:0x23b6ad21d160000 creates two ephemerals(/CONFIGNODE/NP2147483647 and /ACTIVE/192.168.11.94) but when the session expires, only /CONFIGNODE/NP2147483647 is getting deleted)
296695 No Perforce job exists for this issue. 1 233943
7 years, 15 weeks, 1 day ago 0|i14fb3:
ZooKeeper ZOOKEEPER-1599

3.3 server cannot join 3.4 quorum

Bug Closed Blocker Not A Problem Skye Wanderman-Milne Skye Wanderman-Milne Skye Wanderman-Milne 07/Dec/12 13:54   13/Mar/14 14:17 17/Sep/13 18:30 3.3.6, 3.4.5 3.4.6 quorum   0 7   When a 3.3 server attempts to join an existing quorum lead by a 3.4 server, the 3.3 server is disconnected while trying to download the leader's snapshot. The 3.3 server restarts and starts the process over again, but is never able to join the quorum.

3.3 server log:
{code}
2012-12-07 10:44:34,582 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@294] - Getting a snapshot from leader
2012-12-07 10:44:34,582 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@325] - Setting leader epoch 12
2012-12-07 10:44:54,604 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@82] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
2012-12-07 10:44:54,605 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@165] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
{code}

3.4 leader log:
{code}
2012-12-07 10:51:35,178 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@273] - Backward compatibility mode, server id=3
2012-12-07 10:51:35,178 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 3 (n.leader), 0x1100000000 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x11 (n.peerEPoch), LEADING (my state)
2012-12-07 10:51:35,182 [myid:2] - INFO [LearnerHandler-/127.0.0.1:37654:LearnerHandler@263] - Follower sid: 3 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@262f4873
2012-12-07 10:51:35,182 [myid:2] - INFO [LearnerHandler-/127.0.0.1:37654:LearnerHandler@318] - Synchronizing with Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x1100000000
2012-12-07 10:51:35,182 [myid:2] - INFO [LearnerHandler-/127.0.0.1:37654:LearnerHandler@395] - Sending SNAP
2012-12-07 10:51:35,183 [myid:2] - INFO [LearnerHandler-/127.0.0.1:37654:LearnerHandler@419] - Sending snapshot last zxid of peer is 0x1100000000 zxid of leader is 0x1200000000sent zxid of db as 0x1200000000
2012-12-07 10:51:55,204 [myid:2] - ERROR [LearnerHandler-/127.0.0.1:37654:LearnerHandler@562] - Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450)
2012-12-07 10:51:55,205 [myid:2] - WARN [LearnerHandler-/127.0.0.1:37654:LearnerHandler@575] - ******* GOODBYE /127.0.0.1:37654 ********
{code}
296538 No Perforce job exists for this issue. 1 233090
6 years, 2 weeks ago During a rolling upgrade from the 3.3 branch to the 3.4 branch, a 3.3 server won't be able to follow a 3.4, so if there is an election during the upgrade and the new leader is a 3.4 server, then the 3.3 server will be unavailable until it is upgraded. If a 3.3 server leads during the upgrade process and it is the last one to be upgraded, then no problem should be observed. 0|i14a1j:
ZooKeeper ZOOKEEPER-1598

Ability to support more digits in the version string

Improvement Closed Major Fixed Raja Aluri Raja Aluri Raja Aluri 07/Dec/12 13:24   13/Mar/14 14:16 12/Dec/12 02:21   3.4.6, 3.5.0 build   0 3   Ability to support more digits in the version string.
Zookeeper, now expects the version sting to be of X.Y.Z-# format.
With this change, the default behavior is still the same X.Y.Z-# and will not break any existing things.
But at the same time, allows people to tag on their own digits to the version strings, so that they can add a patch or two in their own environments and be able to distinguish between apache zookeeper version and locally modified zookeeper version.
296528 No Perforce job exists for this issue. 1 233080
6 years, 2 weeks ago
Reviewed
0|i149zb:
ZooKeeper ZOOKEEPER-1597

Windows build failing

Bug Closed Major Fixed Michi Mutsuzaki Alexander Shraer Alexander Shraer 04/Dec/12 03:28   13/Mar/14 14:17 17/Nov/13 06:42 3.5.0 3.4.6, 3.5.0 build, c client   0 6   Seems to be related to C client changes done for ZK-1355.
We're not sure why these build failures happen on Windows.

###################################################################################
########################## LAST 60 LINES OF THE CONSOLE ###########################
[...truncated 376 lines...]
.\src\zookeeper.c(768): error C2224: left of '.count' must have struct/union type
.\src\zookeeper.c(768): error C2065: 'i' : undeclared identifier
.\src\zookeeper.c(770): error C2065: 'resolved' : undeclared identifier
.\src\zookeeper.c(770): error C2224: left of '.data' must have struct/union type
.\src\zookeeper.c(770): error C2065: 'i' : undeclared identifier
.\src\zookeeper.c(773): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(774): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(780): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(781): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(788): error C2143: syntax error : missing ';' before 'type'
.\src\zookeeper.c(789): error C2143: syntax error : missing ';' before 'type'
.\src\zookeeper.c(792): error C2065: 'num_old' : undeclared identifier
.\src\zookeeper.c(792): error C2065: 'num_new' : undeclared identifier
.\src\zookeeper.c(794): error C2065: 'found_current' : undeclared identifier
.\src\zookeeper.c(797): error C2065: 'num_old' : undeclared identifier
.\src\zookeeper.c(797): error C2065: 'num_new' : undeclared identifier
.\src\zookeeper.c(814): error C2065: 'found_current' : undeclared identifier
.\src\zookeeper.c(819): error C2065: 'num_old' : undeclared identifier
.\src\zookeeper.c(819): error C2065: 'num_old' : undeclared identifier
.\src\zookeeper.c(819): error C2065: 'num_new' : undeclared identifier
.\src\zookeeper.c(819): error C2065: 'num_old' : undeclared identifier
.\src\zookeeper.c(819): error C2065: 'num_new' : undeclared identifier
.\src\zookeeper.c(819): error C2065: 'num_old' : undeclared identifier
.\src\zookeeper.c(825): error C2065: 'resolved' : undeclared identifier
.\src\zookeeper.c(825): error C2440: '=' : cannot convert from 'int' to 'addrvec_t'
.\src\zookeeper.c(843): error C2065: 'resolved' : undeclared identifier
.\src\zookeeper.c(843): error C2224: left of '.data' must have struct/union type
.\src\zookeeper.c(845): error C2065: 'resolved' : undeclared identifier
.\src\zookeeper.c(848): error C2065: 'hosts' : undeclared identifier
.\src\zookeeper.c(849): error C2065: 'hosts' : undeclared identifier
.\src\zookeeper.c(850): error C2065: 'hosts' : undeclared identifier
.\src\zookeeper.c(853): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(1177): error C2143: syntax error : missing ';' before 'const'
.\src\zookeeper.c(1179): error C2065: 'endpoint_info' : undeclared identifier
.\src\zookeeper.c(1883): error C2143: syntax error : missing ';' before 'type'
.\src\zookeeper.c(1884): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(1885): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(1916): error C2143: syntax error : missing ';' before 'type'
.\src\zookeeper.c(1920): error C2143: syntax error : missing ';' before 'type'
.\src\zookeeper.c(1927): error C2065: 'ssoresult' : undeclared identifier
.\src\zookeeper.c(1927): error C2065: 'enable_tcp_nodelay' : undeclared identifier
.\src\zookeeper.c(1927): error C2065: 'enable_tcp_nodelay' : undeclared identifier
.\src\zookeeper.c(1928): error C2065: 'ssoresult' : undeclared identifier
.\src\zookeeper.c(1944): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(1949): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(1962): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(1963): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(2004): error C2065: 'rc' : undeclared identifier
.\src\zookeeper.c(2004): fatal error C1003: error count exceeds 100; stopping compilation

38 Warning(s)
102 Error(s)
295913 No Perforce job exists for this issue. 4 231939
6 years, 2 weeks ago
Reviewed
0|i142xz:
ZooKeeper ZOOKEEPER-1596

Zab1_0Test should ensure that the file is closed

Bug Closed Major Fixed Enis Soztutar Enis Soztutar Enis Soztutar 03/Dec/12 18:34   13/Mar/14 14:17 11/Dec/12 03:20 3.4.5, 3.5.0 3.4.6, 3.5.0     0 4   Zab1_0Test fails on windows with:
{code}
java.io.IOException: Could not rename temporary file C:\Users\ADMINI~1\AppData\Local\Temp\2\test6831881113551099349dir\version-2\acceptedEpoch.tmp to C:\Users\A
DMINI~1\AppData\Local\Temp\2\test6831881113551099349dir\version-2\acceptedEpoch
at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:82)
at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1121)
at org.apache.zookeeper.server.quorum.QuorumPeer.setAcceptedEpoch(QuorumPeer.java:1148)
at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:281)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
at org.apache.zookeeper.server.quorum.Zab1_0Test$1.run(Zab1_0Test.java:450)
{code}

The file handlers currentEpoch and acceptedEpoch are not closed, so delete fails on windows.
295822 No Perforce job exists for this issue. 1 231172
6 years, 2 weeks ago
Reviewed
0|i13y7j:
ZooKeeper ZOOKEEPER-1595

Sockets should be read until exhausted

Improvement Open Minor Unresolved Unassigned Nikita Vetoshkin Nikita Vetoshkin 03/Dec/12 02:46   11/Dec/12 16:16       server   2 2   Tested on Linux x64 with Oracle JDK6 {{doIO}} method in {{NIOServerCnxn}} should read (and write too) until {{read}}/{{write}} returns 0.
It's a common practice when working with non-blocking sockets. When an underlying system call (multiplexer) signals, that socket is readable, one should {{recv(2)}} all data from kernel buffer until {{recv}} fails with {{EAGAIN}} or {{EWOULDBLOCK}}.

Patch does two things (I know it's not a good idea to mix several changes, but I could stand it):
* splits {{doIO}} into {{doRead}} and {{doWrite}}
* wraps reading with {{while (true)}}

It's pretty easy to instrument the code with a counter and print how many loops we performed until the socket was not readable again.

I wrote a simple python script (http://pastebin.com/N5ifM330) which creates 6000 nodes with 5k data each, having 20 concurrent create requests in progress through one connnection.
With this script and strace attached to JVM I counted epoll_wait syscalls during the test and I got ~9500 before vs ~8000 after.
Run time measurement is very rough, but it's around ~19 secs. before vs 17.5 after.
newbie, performance 293370 No Perforce job exists for this issue. 1 167194
7 years, 15 weeks, 2 days ago 0|i0szaf:
ZooKeeper ZOOKEEPER-1594

TestReconfig intermittently fails

Bug Resolved Major Duplicate Marshall McMullen Marshall McMullen Marshall McMullen 30/Nov/12 00:27   05/Nov/16 13:30 05/Nov/16 13:30 3.5.0   c client   0 6   We've seen an intermittent failure in one of the C client tests TestReconfig which was committed as part of ZOOKEEPER-1355.

The test that is failing is failing *before* any rebalancing algorithm is invoked. After inspecting this we've concluded it is a failure to properly seed the random number generator properly. This same problem was seen and solved on the Java client side so we just need to do something similar on the C client side.

The assertion:

Build/trunk/src/c/tests/TestReconfig.cc:571: Assertion: assertion failed [Expression: numClientsPerHost.at(i) >= lowerboundClientsPerServer(numClients, numServers)]
[exec] [exec] Failures !!!
[exec] [exec] Run: 38 Failure total: 1 Failures: 1 Errors: 0
[exec] [exec] make: *** [run-check] Error 1
[exec]
[exec] BUILD FAILED
[exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1262: The following error occurred while executing this line:
[exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1272: exec returned: 2

Also this one:

From the latest build logs:
[exec] Zookeeper_watchers::testChildWatcher2 : elapsed 54 : OK
[exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestReconfig.cc:183: Assertion: equality assertion failed [Expected: 1, Actual : 0]
[exec] Failures !!!
[exec] Run: 67 Failure total: 1 Failures: 1 Errors: 0
[exec] FAIL: zktest-mt
[exe
292905 No Perforce job exists for this issue. 0 164120
3 years, 19 weeks, 5 days ago 0|i0sgbb:
ZooKeeper ZOOKEEPER-1593

Add Debian style /etc/default/zookeeper support to init script

Improvement Resolved Minor Not A Problem Unassigned Dirkjan Bussink Dirkjan Bussink 29/Nov/12 05:55   10/May/13 19:12 10/May/13 19:12 3.4.5   scripts   0 4   Debian Linux 6.0 In our configuration we use a different data directory for Zookeeper. The problem is that the current Debian init.d script has the default location hardcoded:

ZOOPIDDIR=/var/lib/zookeeper/data
ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid

By using the standard Debian practice of allowing for a /etc/default/zookeeper we can redefine these variables to point to the correct location:

ZOOPIDDIR=/var/lib/zookeeper/data
ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid

[ -r /etc/default/zookeeper ] && . /etc/default/zookeeper

This currently can't be done through /usr/libexec/zkEnv.sh, since that is loaded before ZOOPIDDIR and ZOOPIDFILE are set. Any change there would therefore undo the setup made in for example /etc/zookeeper/zookeeper-env.sh.


292749 No Perforce job exists for this issue. 1 163395
6 years, 45 weeks, 6 days ago Patch for supporting /etc/default/zookeeper in Debian init script 0|i0sbuf:
ZooKeeper ZOOKEEPER-1592

support deleting a node silently

Improvement Open Major Unresolved Unassigned Jimmy Xiang Jimmy Xiang 27/Nov/12 22:52   20/Dec/13 15:03           0 1   Sometimes, we want to delete a node. But we are not sure if the node exists or not. In this case, we want the delete method succeed instead of throwing a NoNodeException. Although we can have a wrapper method to do it, it should be better to build this in to ZK. 292504 No Perforce job exists for this issue. 1 162009
7 years, 15 weeks ago 0|i0s3af:
ZooKeeper ZOOKEEPER-1591

Windows build is broken because inttypes.h doesn't exist

Bug Resolved Major Fixed Marshall McMullen Michi Mutsuzaki Michi Mutsuzaki 27/Nov/12 18:32   01/Dec/12 06:03 30/Nov/12 15:44 3.5.0 3.5.0 c client   0 4   Windows addrvec.h includes inttypes.h, but it is not present in the windows build environment.

https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008/596/console

f:\hudson\hudson-slave\workspace\zookeeper-trunk-winvs2008\trunk\src\c\src\addrvec.h(22): fatal error C1083: Cannot open include file: 'inttypes.h': No such file or directory
292480 No Perforce job exists for this issue. 1 161985
7 years, 16 weeks, 5 days ago 0|i0s353:
ZooKeeper ZOOKEEPER-1590

Patch to add zk.updateServerList(newServerList) broke the build

Bug Resolved Blocker Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 27/Nov/12 15:08   28/Nov/12 06:07 28/Nov/12 02:18 3.5.0 3.5.0     0 3   Here is the related output of jenkins:

{noformat}
validate-xdocs:
[exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml:578:5: The element type "para" must be terminated by the matching end-tag "</para>".
[exec]
[exec] BUILD FAILED
[exec] /home/jenkins/tools/forrest/latest/main/targets/validate.xml:135: Could not validate document /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml
[exec]
{noformat}
292435 No Perforce job exists for this issue. 1 161205
7 years, 17 weeks, 1 day ago
Reviewed
0|i0rybr:
ZooKeeper ZOOKEEPER-1589

Documentation list has wrong numbering

Bug Resolved Minor Invalid Mahadev Konar Flavio Paiva Junqueira Flavio Paiva Junqueira 22/Nov/12 16:07   28/Nov/12 01:41 28/Nov/12 01:41         0 1   Check the version numbers of the documentation links on the project front page:

{noformat}
Release 3.4.5(stable)
Release 3.4.5(current)
{noformat}
259597 No Perforce job exists for this issue. 0 125186
7 years, 17 weeks, 1 day ago 0|i0lrzr:
ZooKeeper ZOOKEEPER-1588

Write Mechanism of Apache Zookeeper and Neociclo Accord

Test Resolved Blocker Not A Problem Unassigned CHANDAN BAGAI CHANDAN BAGAI 22/Nov/12 12:19   22/Nov/12 12:28 22/Nov/12 12:28     tests   0 2   259581 No Perforce job exists for this issue. 0 125170
7 years, 18 weeks ago 0|i0lrw7:
ZooKeeper ZOOKEEPER-1587

Provide simple way to determine IP address of an ephemeral znode's owner

Improvement Open Major Unresolved Unassigned Todd Lipcon Todd Lipcon 21/Nov/12 19:31   13/Dec/12 03:09   3.4.3       0 3   Occasionally I've run into operational cases where an ephemeral znode exists, and is held by some client, but it's not clear which client is the holder. By getting the znode from the shell, one can find the session ID, but as far as I'm aware the only way to reverse that to an IP is by grepping logs, etc. 259451 No Perforce job exists for this issue. 0 124740
7 years, 15 weeks ago 0|i0lp8v:
ZooKeeper ZOOKEEPER-1586

tarballs for zkfuse don't compile out of tree

Bug Patch Available Major Unresolved Raúl Gutiérrez Segalés Raúl Gutiérrez Segalés Raúl Gutiérrez Segalés 19/Nov/12 02:14   09/Oct/13 02:41   3.5.0   contrib-zkfuse   0 1   258548 No Perforce job exists for this issue. 1 119767
6 years, 24 weeks, 1 day ago 0|i0kujr:
ZooKeeper ZOOKEEPER-1585

make dist for src/c broken in trunk

Bug Resolved Major Fixed Raúl Gutiérrez Segalés Raúl Gutiérrez Segalés Raúl Gutiérrez Segalés 19/Nov/12 01:04   02/Mar/16 20:34 26/Nov/12 20:37 3.5.0 3.5.0 c client   0 4   make dist from trunk is failing because of a wrong reference to src/zookeeper_log.h (which exists in include/). 258541 No Perforce job exists for this issue. 1 119744
7 years, 17 weeks, 2 days ago 0|i0kuen:
ZooKeeper ZOOKEEPER-1584

Adding mvn-install target for deploying the zookeeper artifacts to .m2 repository.

Improvement Closed Minor Fixed Ashish Singh Ashish Singh Ashish Singh 14/Nov/12 16:27   13/Mar/14 14:16 14/Dec/12 19:46 3.4.3 3.4.6, 3.5.0 build   0 3   mvn install functionality for zookeeper distribution artifacts to .m2 is not present. 257871 No Perforce job exists for this issue. 1 118148
6 years, 2 weeks ago
Reviewed
0|i0kkk7:
ZooKeeper ZOOKEEPER-1583

Document maxClientCnxns in conf/zoo_sample.cfg

Improvement Closed Critical Fixed Christopher Tubbs Christopher Tubbs Christopher Tubbs 14/Nov/12 14:20   13/Mar/14 14:16 13/Dec/12 01:07 3.4.4 3.4.6, 3.5.0 documentation   0 4 300 300 0% It is silly that maxClientCnxns being set to the default, and that default being too low, is the number one issue for users (according to some: https://raw.github.com/strangeloop/strangeloop2012/master/slides/sessions/Ting-BuildingAnImpenetrableZooKeeper.pdf).

It seems to me that this can be resolved by an extremely simple documentation change: add a commented-out configuration line in conf/zoo_sample.cfg that shows the default, but more importantly, shows users that the configuration option exists.
0% 0% 300 300 configuration, documentation, example 257850 No Perforce job exists for this issue. 2 118126
6 years, 2 weeks ago
Reviewed
0|i0kkfb:
ZooKeeper ZOOKEEPER-1582

EndOfStreamException: Unable to read additional data from client

Bug Resolved Blocker Duplicate Unassigned Yanming Zhou Yanming Zhou 13/Nov/12 04:28   21/Nov/18 07:19 14/Dec/12 14:28         0 18   windows 7
jdk 7
1.download zookeeper-3.4.4.tar.gz and unzip
2.rename conf/zoo_sample.cfg to zoo.cfg
3.click zkServer.cmd
4.click zkCli.cmd

zkCli can not connect to zkServer,it blocked
zkServer console print

2012-11-13 17:28:05,302 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x13af9131eee0000, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:722)
2012-11-13 17:28:05,308 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:54810 which had sessionid 0x13af9131eee0000
257346 No Perforce job exists for this issue. 0 114430
1 year, 17 weeks, 1 day ago 0|i0jxxr:
ZooKeeper ZOOKEEPER-1581

change copyright in notice to 2012

Bug Closed Major Fixed Benjamin Reed Benjamin Reed Benjamin Reed 08/Nov/12 09:56   13/Mar/14 14:17 12/Dec/12 02:00   3.3.7, 3.4.6, 3.5.0 build   0 4   it's 2012 so the copyright in notice.txt should end with 2012 255955 No Perforce job exists for this issue. 1 90828
6 years, 2 weeks ago
Reviewed
0|i0fwa7:
ZooKeeper ZOOKEEPER-1580

QuorumPeer.setRunning is not used

Bug Resolved Minor Fixed maoling Flavio Paiva Junqueira Flavio Paiva Junqueira 08/Nov/12 06:13   30/Jan/18 05:44 30/Jan/18 02:07 3.5.3, 3.4.11, 3.6.0 3.5.4, 3.6.0     0 5   setRunning is a public method and a search did not indicate that it is used anywhere, not even in tests. In fact, I believe we should not change "running" freely and we should only do it when calling shutdown. 255933 No Perforce job exists for this issue. 0 90791
2 years, 7 weeks, 2 days ago 0|i0fw1z:
ZooKeeper ZOOKEEPER-1579

Compile error of UnixOperationSystemMXBean with open JDK

Bug Open Major Unresolved Michelle Chen Michelle Chen Michelle Chen 07/Nov/12 23:06   31/Oct/13 12:22   3.3.4, 3.4.3       1 7 604800 604800 0% zookeeper invokes getOpenFileDescriptorCount() function in com.sun.management.UnixOperatingSystemMXBean, which only exists in SUN JDK, and open JDK did not implement this function.

[javac] /root/zookeeper-3.3.4/src/java/test/org/apache/zookeeper/test/ClientBase.java:57: package com.sun.management does not exist
[javac] import com.sun.management.UnixOperatingSystemMXBean;
[javac] ^
[javac] /root/zookeeper-3.3.4/src/java/test/org/apache/zookeeper/test/QuorumBase.java:39: package com.sun.management does not exist
[javac] import com.sun.management.UnixOperatingSystemMXBean;
[javac] ^
[javac] /root/zookeeper-3.3.4/src/java/test/org/apache/zookeeper/test/ClientTest.java:48: package com.sun.management does not exist
[javac] import com.sun.management.UnixOperatingSystemMXBean;
[javac] ^
[javac] /root/zookeeper-3.3.4/src/java/test/org/apache/zookeeper/test/QuorumUtil.java:39: package com.sun.management does not exist
[javac] import com.sun.management.UnixOperatingSystemMXBean;
0% 0% 604800 604800 patch 255891 No Perforce job exists for this issue. 0 90720
6 years, 21 weeks ago
Reviewed
0|i0fvm7:
ZooKeeper ZOOKEEPER-1578

org.apache.zookeeper.server.quorum.Zab1_0Test failed due to hard code with 33556 port

Bug Closed Major Fixed Michelle Chen Michelle Chen Michelle Chen 07/Nov/12 22:55   13/Mar/14 14:17 17/Dec/12 02:13 3.4.3 3.4.6, 3.5.0     0 6 86400 86400 0% org.apache.zookeeper.server.quorum.Zab1_0Test was failed both with SUN JDK and open JDK.

[junit] Running org.apache.zookeeper.server.quorum.Zab1_0Test
[junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 18.334 sec
[junit] Test org.apache.zookeeper.server.quorum.Zab1_0Test FAILED


Zab1_0Test log:
Zab1_0Test log:
2012-07-11 23:17:15,579 [myid:] - INFO [main:Leader@427] - Shutdown called
java.lang.Exception: shutdown Leader! reason: end of test
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:427)
at org.apache.zookeeper.server.quorum.Zab1_0Test.testLastAcceptedEpoch(Zab1_0Test.java:211)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48)


2012-07-11 23:17:15,584 [myid:] - ERROR [main:Leader@139] - Couldn't bind to port 33556
java.net.BindException: Address already in use
at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:402)
at java.net.ServerSocket.bind(ServerSocket.java:328)
at java.net.ServerSocket.bind(ServerSocket.java:286)
at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:137)
at org.apache.zookeeper.server.quorum.Zab1_0Test.createLeader(Zab1_0Test.java:810)
at org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderInElectingFollowers(Zab1_0Test.java:224)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

2012-07-11 23:17:20,202 [myid:] - ERROR [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@559] - Unex
pected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:291)
2012-07-11 23:17:20,203 [myid:] - WARN [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@569] - ****
*** GOODBYE bdvm039.svl.ibm.com/9.30.122.48:40153 ********
2012-07-11 23:17:20,204 [myid:] - INFO [Thread-20:Leader@421] - Shutting down
2012-07-11 23:17:20,204 [myid:] - INFO [Thread-20:Leader@427] - Shutdown called
java.lang.Exception: shutdown Leader! reason: lead ended

this failure seems 33556 port is already used, but it is not in use with command check in fact. There is a hard code in unit test, we can improve it with code patch.
0% 0% 86400 86400 patch 255890 No Perforce job exists for this issue. 2 90719
6 years, 2 weeks ago
Reviewed
0|i0fvlz:
ZooKeeper ZOOKEEPER-1577

Update website with info on how to report security bugs

Task Open Minor Unresolved Unassigned Eli Collins Eli Collins 07/Nov/12 22:46   07/Nov/12 22:46       documentation   0 1   The website should be updated with information on how to report potential security vulnerabilities. In Hadoop land we have a private security list that anyone case post to that we point to on our list page: Hadoop example http://hadoop.apache.org/general_lists.html#Security. 255888 No Perforce job exists for this issue. 0 90716
7 years, 20 weeks ago 0|i0fvlb:
ZooKeeper ZOOKEEPER-1576

Zookeeper cluster - failed to connect to cluster if one of the provided IPs causes java.net.UnknownHostException

Bug Resolved Major Fixed Edward Ribeiro Tally Tsabary Tally Tsabary 07/Nov/12 06:11   19/Dec/18 06:43 28/Jun/14 11:52 3.5.0 3.5.0 server   2 18   Three 3.4.3 zookeeper servers in cluster, linux. Using a cluster of three 3.4.3 zookeeper servers.
All the servers are up, but on the client machine, the firewall is blocking one of the servers.
The following exception is happening, and the client is not connected to any of the other cluster members.

The exception:Nov 02, 2012 9:54:32 PM com.netflix.curator.framework.imps.CuratorFrameworkImpl logError
SEVERE: Background exception was not retry-able or retry gave up
java.net.UnknownHostException: scnrmq003.myworkday.com
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(Unknown Source)
at java.net.InetAddress.getAddressesFromNameService(Unknown Source)
at java.net.InetAddress.getAllByName0(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at java.net.InetAddress.getAllByName(Unknown Source)
at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:440)
at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:375)

The code at the org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60) is :
public StaticHostProvider(Collection<InetSocketAddress> serverAddresses) throws UnknownHostException {
for (InetSocketAddress address : serverAddresses) {
InetAddress resolvedAddresses[] = InetAddress.getAllByName(address
.getHostName());
for (InetAddress resolvedAddress : resolvedAddresses) { this.serverAddresses.add(new InetSocketAddress(resolvedAddress .getHostAddress(), address.getPort())); }
}
......

The for-loop is not trying to resolve the rest of the servers on the list if there is an UnknownHostException at the InetAddress.getAllByName(address.getHostName());
and it fails the client connection creation.


I was expecting the connection will be created for the other members of the cluster.
Also, InetAddress is a blocking command, and if it takes very long time, (longer than the defined timeout) - that also should allow us to continue to try and connect to the other servers on the list.
Assuming this will be fixed, and we will get connection to the current available servers, I think the zookeeper should continue to retry to connect to the not-connected server of the cluster, so it will be able to use it later when it is back.
If one of the servers on the list is not available during the connection creation, then it should be retried every x time despite the fact that we

255723 No Perforce job exists for this issue. 5 90498
2 years, 33 weeks, 4 days ago 0|i0fu8v:
ZooKeeper ZOOKEEPER-1575

adding .gitattributes to prevent CRLF and LF mismatches for source and text files

Bug Resolved Major Fixed Raja Aluri Raja Aluri Raja Aluri 06/Nov/12 20:31   04/Apr/14 07:12 03/Apr/14 21:36   3.4.7, 3.5.0     0 4   adding .gitattributes to prevent CRLF and LF mismatches for source and text files 255613 No Perforce job exists for this issue. 1 90264
5 years, 50 weeks, 6 days ago 0|i0fssv:
ZooKeeper ZOOKEEPER-1574

mismatched CR/LF endings in text files

Improvement Resolved Minor Fixed Raja Aluri Raja Aluri Raja Aluri 06/Nov/12 20:21   09/Apr/14 23:22 03/Apr/14 21:31 3.4.6, 3.5.0 3.4.7, 3.5.0     0 5   Source code in zookeeper repo has a bunch of files that have CRLF endings.
With more development happening on windows there is a higher chance of more CRLF files getting into the source tree.
I would like to avoid that by creating .gitattributes file which prevents sources from having CRLF entries in text files.
But before adding the .gitattributes file we need to normalize the existing tree, so that people when they sync after .giattributes change wont end up with a bunch of modified files in their workspace.
I am adding a couple of links here to give more primer on what exactly is the issue and how we are trying to fix it.
[http://git-scm.com/docs/gitattributes#_checking_out_and_checking_in]
[http://stackoverflow.com/questions/170961/whats-the-best-crlf-handling-strategy-with-git]
I will submit a separate bug and patch for .gitattributes
255612 No Perforce job exists for this issue. 3 90263
5 years, 50 weeks ago 0|i0fssn:
ZooKeeper ZOOKEEPER-1573

Unable to load database due to missing parent node

Bug Closed Critical Fixed Vinayakumar B Thawan Kooburat Thawan Kooburat 01/Nov/12 19:13   13/Mar/14 14:17 10/Feb/14 15:53 3.4.3, 3.5.0 3.4.6, 3.5.0 server   0 12   While replaying txnlog on data tree, the server has a code to detect missing parent node. This code block was last modified as part of ZOOKEEPER-1333. In our production, we found a case where this check is return false positive.

The sequence of txns is as follows:

zxid 1: create /prefix/a
zxid 2: create /prefix/a/b
zxid 3: delete /prefix/a/b
zxid 4: delete /prefix/a

The server start capturing snapshot at zxid 1. However, by the time it traversing the data tree down to /prefix, txn 4 is already applied and /prefix have no children.

When the server restore from snapshot, it process txnlog starting from zxid 2. This txn generate missing parent error and the server refuse to start up.

The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know if we have any option beside removing this check to solve this issue.
253905 No Perforce job exists for this issue. 5 81579
6 years, 2 weeks ago
Reviewed
0|i0eb7r:
ZooKeeper ZOOKEEPER-1572

Add an async interface for multi request

Improvement Resolved Major Fixed Sijie Guo Sijie Guo Sijie Guo 01/Nov/12 03:57   30/Jul/15 03:51 03/Feb/13 10:36 3.4.5 3.5.0 java client   0 9   ZOOKEEPER-2237 Currently there is no async interface for multi request in ZooKeeper java client. review 253561 No Perforce job exists for this issue. 3 79004
4 years, 34 weeks ago 0|i0dvbr:
ZooKeeper ZOOKEEPER-1571

Allow QuorumUtil.java build with IBM Java

Improvement Resolved Major Duplicate Unassigned Paulo Ricardo Paz Vital Paulo Ricardo Paz Vital 30/Oct/12 11:22   01/May/13 22:29 28/Nov/12 06:21 3.4.4 3.4.4 tests   0 1   Linux (x86_64), RHEL 6.3, IBM Java 6 SR 11 The org.apache.zookeeper.test.QuorumUtil class imports the com.sun.management.UnixOperatingSystemMXBean class, that fail to build when using IBM Java 6 SR 11. This issue is resolved by new class OSMXBean class proposed in JIRA's 1474.

The class OSMXBean (org.apache.zookeeper.server.util.OSMXBean) is a wrapper for the implementation of com.sun.management.UnixOperatingSystemMXBean, and decides to use the SUN API or its own implementation depending on the runtime (vendor) used.
test 253160 No Perforce job exists for this issue. 1 75887
7 years, 17 weeks, 1 day ago 0|i0dc33:
ZooKeeper ZOOKEEPER-1570

Allow QuorumBase.java build with IBM Java

Improvement Resolved Major Duplicate Unassigned Paulo Ricardo Paz Vital Paulo Ricardo Paz Vital 30/Oct/12 11:20   01/May/13 22:29 28/Nov/12 06:20 3.4.4 3.4.4 tests   0 1   Linux, RHEL 6.3, IBM Java 6 SR 11 The org.apache.zookeeper.test.QuorumBase class imports the com.sun.management.UnixOperatingSystemMXBean class, that fail to build when using IBM Java 6 SR 11. This issue is resolved by new class OSMXBean class proposed in JIRA's 1474.

The class OSMXBean (org.apache.zookeeper.server.util.OSMXBean) is a wrapper for the implementation of com.sun.management.UnixOperatingSystemMXBean, and decides to use the SUN API or its own implementation depending on the runtime (vendor) used.
test 253159 No Perforce job exists for this issue. 1 75886
7 years, 17 weeks, 1 day ago 0|i0dc2v:
ZooKeeper ZOOKEEPER-1569

support upsert: setData if the node exists, otherwise, create a new node

Improvement Open Major Unresolved Unassigned Jimmy Xiang Jimmy Xiang 23/Oct/12 13:47   20/Dec/13 15:03           1 3   Currently, ZooKeeper supports setData and create. If it can support upsert like in SQL, it will be great. 250604 No Perforce job exists for this issue. 3 62043
7 years, 14 weeks, 3 days ago 0|i0azl3:
ZooKeeper ZOOKEEPER-1568

multi should have a non-transaction version

Improvement Open Major Unresolved Unassigned Jimmy Xiang Jimmy Xiang 23/Oct/12 13:43   20/Dec/13 15:04           0 4   Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. 250603 No Perforce job exists for this issue. 2 62042
7 years, 15 weeks ago 0|i0azkv:
ZooKeeper ZOOKEEPER-1567

JMX can't be disabled with zkEnv.sh

Bug Patch Available Major Unresolved Jakub Lekstan Jakub Lekstan Jakub Lekstan 17/Oct/12 14:34   17/Oct/12 15:09   3.4.4   scripts   0 1   zkServer.sh looks for JMX variables before "including" zkEnv.sh, this way you can not disable JMX with scripts which zkEnv.sh "includes".

Patch included.
249356 No Perforce job exists for this issue. 1 57423
7 years, 23 weeks, 1 day ago 0|i0a73j:
ZooKeeper ZOOKEEPER-1566

progress quits duo to zxid not in order

Bug Open Major Unresolved Unassigned Zhou wenjian Zhou wenjian 17/Oct/12 06:09   17/Oct/12 06:11           0 1   2012-10-17 15:04:28,006 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@116] - Got 0x3800000002 expected 0x3800000001
2012-10-17 15:04:28,007 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@116] - Got zxid 0x3800000001 expected 0x3800000003
2012-10-17 15:04:28,007 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@116] - Got zxid 0x3800000003 expected 0x3800000002
2012-10-17 15:04:28,009 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@112] - Committing zxid 0x3800000003 but next pending txn 0x3800000001
249252 No Perforce job exists for this issue. 0 57307
7 years, 23 weeks, 1 day ago 0|i0a6dr:
ZooKeeper ZOOKEEPER-1565

Allow ClientTest.java build with IBM Java

Improvement Resolved Major Duplicate Unassigned Paulo Ricardo Paz Vital Paulo Ricardo Paz Vital 16/Oct/12 15:56   01/May/13 22:29 28/Nov/12 06:18 3.4.4 3.4.4 tests   0 1   Linux, RHEL 6.3, IBM Java 6 SR 11 The org.apache.zookeeper.test.ClientTest class imports the com.sun.management.UnixOperatingSystemMXBean class, that fail to build when using IBM Java 6 SR 11. This issue is resolved by new class OSMXBean class proposed in JIRA's 1474.

The class OSMXBean (org.apache.zookeeper.server.util.OSMXBean) is a wrapper for the implementation of com.sun.management.UnixOperatingSystemMXBean, and decides to use the SUN API or its own implementation depending on the runtime (vendor) used.
test 249113 No Perforce job exists for this issue. 1 57059
7 years, 17 weeks, 1 day ago 0|i0a4un:
ZooKeeper ZOOKEEPER-1564

Allow JUnit test build with IBM Java

Improvement Closed Major Fixed Paulo Ricardo Paz Vital Paulo Ricardo Paz Vital Paulo Ricardo Paz Vital 15/Oct/12 15:32   13/Mar/14 14:17 11/Dec/12 02:46 3.4.4, 3.4.5, 3.5.0 3.4.6, 3.5.0 tests   0 3   Linux, RHEL 6.3, IBM Java 6 SR 11 The org.apache.zookeeper.test.ClientBase, org.apache.zookeeper.test.ClientTest, org.apache.zookeeper.test.QuorumBase and org.apache.zookeeper.test.QuorumUtil classes import the com.sun.management.UnixOperatingSystemMXBean class, that fail to build when using IBM Java 6 SR 11. This issue is resolved by new class OSMXBean class proposed in JIRA's ZOOKEEPER-1474.

The class OSMXBean (org.apache.zookeeper.server.util.OSMXBean) is a wrapper for the implementation of com.sun.management.UnixOperatingSystemMXBean, and decides to use the SUN API or its own implementation depending on the runtime (vendor) used.
test 248796 No Perforce job exists for this issue. 3 56277
6 years, 2 weeks ago 0|i0a00v:
ZooKeeper ZOOKEEPER-1563

Wrong solution - unable to build under Windows with Visual Studio

Bug Resolved Major Fixed Unassigned Jakub Lekstan Jakub Lekstan 13/Oct/12 10:55   28/Dec/12 07:35 28/Dec/12 07:35 3.4.4   c client   0 3   Windows 7 x64
Visual Studio C++ 2010 Express
When I try to open zookeeper.sln the VS wants me to convert the project. While the convertion is taking place I'm getting a message:

"A file with the name: "[path]\zookeeper.vcxproj" already exists on disk.
Do you want to overwrite the project and its imported property sheets"

And after it I get next message with same text but it is about Cli.vxproj

No matter If I click yes or no the coverting process fails, both projects (Cli and zookeeper) are marked as unavailable.

If I close VS and open the zookeeper.sln once again it wants me to convert but now if I answer yes the projects are again unavailable but if I answer no the projects are available but are empty.
248459 No Perforce job exists for this issue. 0 55508
7 years, 12 weeks, 6 days ago visual studio 0|i09v9z:
ZooKeeper ZOOKEEPER-1562

Memory leaks in zoo_multi API

Bug Closed Trivial Fixed Deepak Jagtap Deepak Jagtap Deepak Jagtap 12/Oct/12 21:03   13/Mar/14 14:16 03/Feb/13 01:42 3.4.3, 3.4.4 3.4.6, 3.5.0 c client   0 6   Zookeeper client and server both are running on CentOS 6.3 Valgrind is reporting memory leak for zoo_multi operations.

==4056== 2,240 (160 direct, 2,080 indirect) bytes in 1 blocks are definitely lost in loss record 18 of 24
==4056== at 0x4A04A28: calloc (vg_replace_malloc.c:467)
==4056== by 0x504D822: create_completion_entry (zookeeper.c:2322)
==4056== by 0x5052833: zoo_amulti (zookeeper.c:3141)
==4056== by 0x5052A8B: zoo_multi (zookeeper.c:3240)

It looks like completion entries for individual operations in multiupdate transaction are not getting freed. My observation is that memory leak size depends on the number of operations in single mutlipupdate transaction
patch 248154 No Perforce job exists for this issue. 1 53975
6 years, 2 weeks ago zoo_multi API used to leak memory while deserializing the response from zookeeper server.
Completion entries for individual operation in zoo_multi transaction weren't getting cleaned causing memory leak. This patch resolves this memory leak by destroying completion entries in deserialize_multi function.
Reviewed
zoo_multi memory-leak 0|i09lv3:
ZooKeeper ZOOKEEPER-1561

Zookeeper client may hang on a server restart

Bug Resolved Major Duplicate Unassigned Jacky007 Jacky007 11/Oct/12 03:43   23/Dec/12 23:18 23/Dec/12 22:54 3.5.0 3.5.0 java client   1 3   In the doIO method of ClientCnxnSocketNIO
{noformat}
if (p != null) {
outgoingQueue.removeFirstOccurrence(p);
updateLastSend();
if ((p.requestHeader != null) &&
(p.requestHeader.getType() != OpCode.ping) &&
(p.requestHeader.getType() != OpCode.auth)) {
p.requestHeader.setXid(cnxn.getXid());
}
p.createBB();
ByteBuffer pbb = p.bb;
sock.write(pbb);
if (!pbb.hasRemaining()) {
sentCount++;
if (p.requestHeader != null
&& p.requestHeader.getType() != OpCode.ping
&& p.requestHeader.getType() != OpCode.auth) {
pending.add(p);
}
}
{noformat}
When the sock.write(pbb) method throws an exception, the packet will not be cleanup(not in outgoingQueue nor in pendingQueue). If the client wait for it, it will wait forever...
247277 No Perforce job exists for this issue. 0 46150
7 years, 13 weeks, 3 days ago It is fixed in ZOOKEEPER-1560. 0|i089kn:
ZooKeeper ZOOKEEPER-1560

Zookeeper client hangs on creation of large nodes

Bug Resolved Major Fixed Skye Wanderman-Milne Igor Motov Igor Motov 10/Oct/12 19:45   31/Oct/12 19:00 31/Oct/12 14:44 3.4.4, 3.5.0 3.4.5, 3.5.0 java client   0 12   To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue.

It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server.
247171 No Perforce job exists for this issue. 11 45345
7 years, 21 weeks, 1 day ago
Reviewed
0|i084lr:
ZooKeeper ZOOKEEPER-1559

ZOOKEEPER-1549 Learner should not snapshot uncommitted state

Sub-task Open Major Unresolved Hongchao Deng Flavio Paiva Junqueira Flavio Paiva Junqueira 06/Oct/12 09:46   02/Dec/14 21:50       quorum   0 5   The code in Learner.java is a bit entangled for backward compatibility reasons. We need to make sure that we can remove the calls to take a snapshot without breaking it. 244712 No Perforce job exists for this issue. 0 31349
5 years, 16 weeks, 1 day ago 0|i05q8v:
ZooKeeper ZOOKEEPER-1558

ZOOKEEPER-1549 Leader should not snapshot uncommitted state

Sub-task Closed Blocker Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 06/Oct/12 09:45   13/Mar/14 14:17 19/Oct/13 06:06 3.4.6 3.4.6 quorum   0 5   Leader currently takes a snapshot when it calls loadData in the beginning of the lead() method. The loaded data, however, may contain uncommitted state. 244711 No Perforce job exists for this issue. 8 31348
6 years, 2 weeks ago 0|i05q8n:
ZooKeeper ZOOKEEPER-1557

jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch

Bug Closed Major Fixed Eugene Joseph Koontz Patrick D. Hunt Patrick D. Hunt 04/Oct/12 19:03   13/Mar/14 14:17 23/Oct/13 21:10 3.4.5, 3.5.0 3.4.6, 3.5.0 server, tests   0 6   Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job:

https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/

haven't seen this before.
241704 No Perforce job exists for this issue. 4 11414
6 years, 2 weeks ago Committed to 3.4.6/trunk. Thanks Eugene. 0|i02b6v:
ZooKeeper ZOOKEEPER-1556

Memory leak reported by valgrind mt version

Bug Open Minor Unresolved Unassigned André Martin André Martin 03/Oct/12 15:10   24/Oct/17 00:39   3.4.4   c client   0 5   Valgrind reports the following memory leak when using the c-client (mt):

==11674== 18 bytes in 9 blocks are indirectly lost in loss record 14 of 173
==11674== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11674== by 0xC8064A: ia_deserialize_string (recordio.c:271)
==11674== by 0xC81F2E: deserialize_String_vector (zookeeper.jute.c:247)
==11674== by 0xC842F9: deserialize_GetChildrenResponse (zookeeper.jute.c:874)
==11674== by 0xC7E9F0: zookeeper_process (zookeeper.c:1904)
==11674== by 0xC7FE5B: do_io (mt_adaptor.c:439)
==11674== by 0x4E39E99: start_thread (pthread_create.c:308)
==11674== by 0x5FA6DBC: clone (clone.S:112)
==11674==
==11674== 90 (72 direct, 18 indirect) bytes in 49 blocks are definitely lost in loss record 139 of 173
==11674== at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11674== by 0xC81EEE: deserialize_String_vector (zookeeper.jute.c:245)
==11674== by 0xC842F9: deserialize_GetChildrenResponse (zookeeper.jute.c:874)
==11674== by 0xC7E9F0: zookeeper_process (zookeeper.c:1904)
==11674== by 0xC7FE5B: do_io (mt_adaptor.c:439)
==11674== by 0x4E39E99: start_thread (pthread_create.c:308)
==11674== by 0x5FA6DBC: clone (clone.S:112)
242157 No Perforce job exists for this issue. 0 12704
2 years, 21 weeks, 2 days ago 0|i02j5j:
ZooKeeper ZOOKEEPER-1555

ACLs are not respected for node deletion

Bug Resolved Critical Not A Problem Unassigned Guillaume Nodet Guillaume Nodet 03/Oct/12 11:59   03/Oct/12 12:08 03/Oct/12 12:08 3.4.3       0 1   Any session can delete nodes with restricted ACLs. 242158 No Perforce job exists for this issue. 0 12705
7 years, 25 weeks, 1 day ago 0|i02j5r:
ZooKeeper ZOOKEEPER-1554

Can't use zookeeper client without SASL

Bug Closed Blocker Fixed Unassigned Guillaume Nodet Guillaume Nodet 03/Oct/12 11:35   13/Mar/14 14:17 30/Oct/13 00:21 3.4.4 3.4.6, 3.5.0     3 10   The ZooKeeperSaslClient correctly detects that it should not use SASL when nothing is configured, however the SendThread waits forever because clientTunneledAuthenticationInProgress() returns true instead of false. 242159 No Perforce job exists for this issue. 0 12706
6 years, 2 weeks ago 0|i02j5z:
ZooKeeper ZOOKEEPER-1553

Findbugs configuration is missing some dependencies

Bug Closed Minor Fixed Sean Busbey Sean Busbey Sean Busbey 01/Oct/12 18:00   13/Mar/14 14:16 12/Dec/12 03:03 3.5.0 3.4.6, 3.5.0 build   0 4   While updating the findbugs configuration to account for a change in log4j versions I noticed findbugs complaining about access to the netty and slf4j classes.

Steps to reproduce:

# install findbugs to $FINDBUGS_HOME
# run ant -Dfindbugs.home="$FINDBUGS_HOME" findbugs

239567 No Perforce job exists for this issue. 1 2351
6 years, 2 weeks ago
Reviewed
0|i00r9z:
ZooKeeper ZOOKEEPER-1552

Enable sync request processor in Observer

Improvement Closed Major Fixed Flavio Paiva Junqueira Thawan Kooburat Thawan Kooburat 30/Sep/12 21:28   13/Mar/14 14:17 30/Sep/13 16:55 3.4.3 3.4.6, 3.5.0 quorum, server   0 8   Observer doesn't forward its txns to SyncRequestProcessor. So it never persists the txns onto disk or periodically creates snapshots. This increases the start-up time since it will get the entire snapshot if the observer has be running for a long time.
239578 No Perforce job exists for this issue. 9 2366
6 years, 2 weeks ago 0|i00rdb:
ZooKeeper ZOOKEEPER-1551

Observers ignore txns that come after snapshot and UPTODATE

Bug Closed Blocker Fixed Thawan Kooburat Thawan Kooburat Thawan Kooburat 30/Sep/12 20:57   13/Mar/14 14:17 08/Oct/13 12:34 3.4.3 3.4.6, 3.5.0 quorum, server   2 8   In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. 239554 No Perforce job exists for this issue. 7 2333
6 years, 2 weeks ago 0|i00r5z:
ZooKeeper ZOOKEEPER-1550

ZooKeeperSaslClient does not finish anonymous login on OpenJDK

Bug Resolved Blocker Fixed Eugene Joseph Koontz Robert Macomber Robert Macomber 26/Sep/12 12:32   16/Jan/13 14:00 28/Sep/12 13:06 3.4.4 3.4.5 java client   0 6   On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does not throw an exception. {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an exception from that method as a proxy for "this client is not configured to use SASL" and as a result no commands can be sent, since it is still waiting for auth to complete.

[Link to mailing list discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667]

The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do getChildren("/")':

{code:title=OpenJDK}
INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection...
DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider Waiting for connected-state...
INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 org.apache.zookeeper.ClientCnxn Opening socket connection to server mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL (unknown error)
INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 org.apache.zookeeper.ClientCnxn Socket connection established to mike.local/10.0.2.106:2181, initiating session
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 org.apache.zookeeper.ClientCnxn Session establishment request sent on mike.local/10.0.2.106:2181
INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 org.apache.zookeeper.ClientCnxn Session establishment complete on server mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout = 40000
DEBUG [main-EventThread] 2012-09-25 14:02:24,614 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,265 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,265 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes.
DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,266 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes.
INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,266 org.apache.zookeeper.ClientCnxn Client session timed out, have not heard from server in 26668ms for sessionid 0x139ff2e85b60005, closing socket connection and attempting reconnect
DEBUG [main-EventThread] 2012-09-25 14:02:51,377 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Disconnected)
{code}

{code:title=Oracle}
INFO [main] 2012-09-25 14:03:16,315 com.socrata.Main Waiting for connection...
DEBUG [main] 2012-09-25 14:03:16,319 com.socrata.zookeeper.ZooKeeperProvider Waiting for connected-state...
INFO [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,335 org.apache.zookeeper.ClientCnxn Opening socket connection to server 10.0.2.106/10.0.2.106:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
INFO [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,344 org.apache.zookeeper.ClientCnxn Socket connection established to 10.0.2.106/10.0.2.106:2181, initiating session
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,346 org.apache.zookeeper.ClientCnxn Session establishment request sent on 10.0.2.106/10.0.2.106:2181
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,347 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,351 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
INFO [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,368 org.apache.zookeeper.ClientCnxn Session establishment complete on server 10.0.2.106/10.0.2.106:2181, sessionid = 0x139ff2e85b60006, negotiated timeout = 40000
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,371 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,371 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-EventThread] 2012-09-25 14:03:16,385 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected)
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,417 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,417 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,417 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,418 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,418 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,431 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,438 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,443 org.apache.zookeeper.ClientCnxn Reading reply sessionid:0x139ff2e85b60006, packet:: clientPath:/ serverPath:/ finished:false header:: 1,12 replyHeader:: 1,8292982,0 request:: '/,F response:: v{'ro,'row-index,'zkbtest,'consumers,'reindex,'hotstandby,'bigdir,'vs,'orestes,'eurybates,'shardedcly,'row-locks,'id-counter,'zookeeper,'cly,'locks,'rwlocks,'tickets,'brokers},s{0,0,0,0,0,61,0,0,0,19,8292893}
DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,444 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration
OK(Set(cly, row-locks, hotstandby, locks, tickets, bigdir, zkbtest, row-index, reindex, id-counter, eurybates, vs, rwlocks, shardedcly, brokers, consumers, zookeeper, orestes, ro),0,0,0,0,0,61,0,0,0,19,8292893)
{code}
242160 No Perforce job exists for this issue. 3 12707
7 years, 25 weeks, 6 days ago 0|i02j67:
ZooKeeper ZOOKEEPER-1549

Data inconsistency when follower is receiving a DIFF with a dirty snapshot

Bug Open Major Unresolved Flavio Paiva Junqueira Jacky007 Jacky007 10/Sep/12 03:58   05/Feb/20 07:15   3.4.3 3.7.0, 3.5.8 quorum   2 21   ZOOKEEPER-1558, ZOOKEEPER-1559, ZOOKEEPER-2020 the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is not correct.
here is scenario(similar to 1154):
Initial Condition
1. Lets say there are three nodes in the ensemble A,B,C with A being the leader
2. The current epoch is 7.
3. For simplicity of the example, lets say zxid is a two digit number, with epoch being the first digit.
4. The zxid is 73
5. All the nodes have seen the change 73 and have persistently logged it.
Step 1
Request with zxid 74 is issued. The leader A writes it to the log but there is a crash of the entire ensemble and B,C never write the change 74 to their log.
Step 2
A,B restart, A is elected as the new leader, and A will load data and take a clean snapshot(change 74 is in it), then send diff to B, but B died before sync with A. A died later.
Step 3
B,C restart, A is still down
B,C form the quorum
B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
epoch is now 8, zxid is 80
Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog is 81
Step 4
A starts up. It applies the change in request with zxid 74 to its in-memory data tree
A contacts B to registerAsFollower and provides 74 as its ZxId
Since 71<=74<=81, B decides to send A the diff.
Problem:
The problem with the above sequence is that after truncate the log, A will load the snapshot again which is not correct.

In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), the leader will send a snapshot to follower, it will not be a problem.
242161 No Perforce job exists for this issue. 3 12708
2 years, 32 weeks, 2 days ago 0|i02j6f:
ZooKeeper ZOOKEEPER-1548

Cluster fails election loop in new and interesting way

Bug Closed Major Duplicate Unassigned Alan Horn Alan Horn 07/Sep/12 17:51   13/Mar/14 14:16 29/Aug/13 10:22 3.4.3 3.4.6 leaderElection   0 6   Hi,

We have a five node cluster, recently upgraded from 3.3.5 to 3.4.3. Was running fine for a few weeks after the upgrade, then the following sequence of events occurred :

1. All servers stopped responding to 'ruok' at the same time
2. Our local supervisor process restarted all of them at the same time
(yes, this is bad, we didn't expect it to fail this way :)
3. The cluster would not serve requests after this. Appeared to be unable to complete an election.

We tried various things at this point, none of which worked :

* Moved around the restart order of the nodes (e.g. 4 thru 0, instead of 0 thru 4)
* Reduced number of running nodes from 5 -> 3 to simplify the quorum, by only starting up 0, 1 & 2, in one test, and 0, 2 & 4 in the other
* Removed the *Epoch files from version-2/ snapshot directory
* Put the same version2/snapshot.xxxxx file on each server in the cluster
* Added the (same on all nodes) last txlog onto each cluster
* Kept only the last snapshot plus txlog unique on each server
* Moved leaderServes=no to leaderServes=yes
* Removed all files and started up with empty data as a control. This worked, but of course isn't terribly useful :)

Finally, I brought the data up on a single node running in standalone and this worked (yay!) So at this point we brought the single node back into service and have kept the other four available to debug why the election is failing.

We downgraded the four nodes to 3.3.5, and then they completed the election and started serving as expected.
We did a rolling upgrade to 3.4.3, and everything was fine until we restarted the leader, whereupon we encountered the same re-election loop as before.

We're a bit out of ideas at this point, so I was hoping someone from this list might have some useful input.

Output from two followers and a leader during this condition are attached.

Cheers,

Al
242162 No Perforce job exists for this issue. 3 12709
6 years, 2 weeks ago 0|i02j6n:
ZooKeeper ZOOKEEPER-1547

Test robustness of client using SASL in the presence of dropped requests

Improvement Open Major Unresolved Unassigned Eugene Joseph Koontz Eugene Joseph Koontz 04/Sep/12 16:42   06/Nov/12 00:13           0 2   ZK clients send SASL packets to ZK servers as request packets. However, what if the server does not responds to the client's SASL packets with responses? In this scenario, the server does not actually close the connection to the client, it simply fails to respond to SASL requests. Make sure the client can cope with this behavior.

Background:

In ZOOKEEPER-1437, Ben writes:

"[I]t would be great to add a test that simply drops responses to clients without closing connections."

https://issues.apache.org/jira/browse/ZOOKEEPER-1437?focusedCommentId=13447477&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13447477

Also in ZOOKEEPER-1437 Rakesh writes: "I could see DisconnectableZooKeeper.disconnect() has network delays/partition simulation logic."

https://issues.apache.org/jira/browse/ZOOKEEPER-1437?focusedCommentId=13445704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13445704

242163 No Perforce job exists for this issue. 0 12710
7 years, 29 weeks, 2 days ago 0|i02j6v:
ZooKeeper ZOOKEEPER-1546

"Unable to load database on disk" when restarting after node freeze

Bug Open Major Unresolved Unassigned Erik Forsberg Erik Forsberg 04/Sep/12 05:36   04/Jun/15 14:55   3.3.5   server   1 4   One of my zookeeper servers in a quorum of 3 froze (probably due to underlying hardware problems). When restarting, zookeeper fails to start with the following in zookeeper.log:

{noformat}
2012-09-04 09:02:35,300 - INFO [main:QuorumPeerConfig@90] - Reading configuration from: /etc/zookeeper/zoo.cfg
2012-09-04 09:02:35,316 - INFO [main:QuorumPeerConfig@310] - Defaulting to majority quorums
2012-09-04 09:02:35,333 - INFO [main:QuorumPeerMain@119] - Starting quorum peer
2012-09-04 09:02:35,358 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181
2012-09-04 09:02:35,379 - INFO [main:QuorumPeer@819] - tickTime set to 2000
2012-09-04 09:02:35,380 - INFO [main:QuorumPeer@830] - minSessionTimeout set to -1
2012-09-04 09:02:35,380 - INFO [main:QuorumPeer@841] - maxSessionTimeout set to -1
2012-09-04 09:02:35,386 - INFO [main:QuorumPeer@856] - initLimit set to 10
2012-09-04 09:02:35,523 - INFO [main:FileSnap@82] - Reading snapshot /var/zookeeper/version-2/snapshot.500017240
2012-09-04 09:02:38,944 - ERROR [main:FileTxnSnapLog@226] - Failed to increment parent cversion for: /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
at org.apache.zookeeper.server.DataTree.incrementCversion(DataTree.java:1218)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:224)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:152)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
2012-09-04 09:02:38,945 - FATAL [main:QuorumPeer@400] - Unable to load database on disk
java.io.IOException: Failed to process transaction type: 2 error: KeeperErrorCode = NoNode for /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
2012-09-04 09:02:38,946 - FATAL [main:QuorumPeerMain@87] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
Caused by: java.io.IOException: Failed to process transaction type: 2 error: KeeperErrorCode = NoNode for /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
... 3 more

{noformat}

Removing data from /var/zookeeper/version-2 then restart seems to "fix" the problem (it gets a snapshot from one of the other nodes in the quorum).

This is Zookeeper 3.3.5+19.5-1~squeeze-cdh3, i.e. from Cloudera's distribution.
242164 No Perforce job exists for this issue. 0 12711
4 years, 42 weeks ago 0|i02j73:
ZooKeeper ZOOKEEPER-1545

very odd issue about zookeeper when deploy two web application in one tomcat

Bug Open Major Unresolved Unassigned L.J.W L.J.W 04/Sep/12 02:20   23/Feb/19 03:06   3.4.3   java client   0 2   OS:windows 7 32
zookeeper 3.4.3
tomcat 7.0.29
if I deploy two application(both use zookeeper) to same tomcat,zookeeper in one app will inexplicable disconnect when tomcat startup.

following is my code,it is very simple:

public class ZKTester implements InitializingBean, Watcher {

private ZooKeeper hZooKeeper;

public void afterPropertiesSet() throws Exception {
hZooKeeper = new ZooKeeper("localhost:2181", 300000, this);
}

public void process(WatchedEvent event) {
System.out.println("**************" + event);
}

and the spring config file:

<bean id="zooTester" class="com.abc.framework.cluster.ZKTester"/>

And following is tomcat's startup log:

...
**************WatchedEvent state:Disconnected type:None path:null
**************WatchedEvent state:Expired type:None path:null
...
242165 No Perforce job exists for this issue. 0 12712
1 year, 3 weeks, 5 days ago 0|i02j7b:
ZooKeeper ZOOKEEPER-1544

System.exit() calls on interrupted SyncThread

Bug Resolved Trivial Duplicate Unassigned Dawid Weiss Dawid Weiss 03/Sep/12 10:01   03/Sep/12 12:03 03/Sep/12 12:03 3.3.6       0 2   We have a test framework at Lucene/Solr which attempts to interrupt threads that leak out of a single class (suite) scope. The problem we're facing is that ZooKeeper's SyncThread is doing this:
{code}
>> LOG.fatal("Severe unrecoverable error, exiting", t);
>> System.exit(11);
{code}

Is this terminating the JVM really needed here? Could it be made optional with a system property or even removed entirely? Currently it aborts the entire JUnit runner and prevents successive tests from continuing.

242166 No Perforce job exists for this issue. 0 12713
7 years, 29 weeks, 3 days ago 0|i02j7j:
ZooKeeper ZOOKEEPER-1543

Bad sessionId/password combo should return auth failure

Improvement Open Major Unresolved Unassigned Ben Bangert Ben Bangert 31/Aug/12 14:40   10/Sep/12 19:34   3.4.3, 3.3.6, 3.5.0   server   1 4   All When connecting to a server with a valid session id, but invalid password, Zookeeper disconnects with a SESSION_EXPIRED error. This is blatantly false, its actually the wrong password. Returning a SESSION_EXPIRED in this case is also not documented anywhere.

This makes debugging this issue an absolute nightmare, since the server has already lead you down the wrong track (trying to figure out why the session is expired, but it isn't).

There's already an AUTH_FAILURE error, why not return that?
242167 No Perforce job exists for this issue. 0 12714
7 years, 28 weeks, 3 days ago 0|i02j7r:
ZooKeeper ZOOKEEPER-1542

zkServer.sh start fails but exit status 0

Bug Open Major Unresolved Unassigned Ryu Umayahara Ryu Umayahara 24/Aug/12 14:53   24/Aug/12 15:00   3.3.6   scripts   0 1   Windwos7 + Cygwin zkServer.sh

99 nohup $JAVA "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
100 -cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null &

Cannot capture exit status of a background process.

101 if [ $? -eq 0 ]
102 then
103 if /bin/echo -n $! > "$ZOOPIDFILE"
104 then
105 sleep 1
106 echo STARTED
107 else
108 echo FAILED TO WRITE PID
109 exit 1
110 fi
111 else
112 echo SERVER DID NOT START
113 exit 1
114 fi
242168 No Perforce job exists for this issue. 1 12715
7 years, 30 weeks, 6 days ago 0|i02j7z:
ZooKeeper ZOOKEEPER-1541

Zookeeper distributions are not available.

Bug Resolved Critical Cannot Reproduce Unassigned Yuta Okamoto Yuta Okamoto 24/Aug/12 01:09   09/Oct/13 02:19 09/Oct/13 02:19         0 2   I can't download zookeeper distribution because of "404 Not Found".

http://www.apache.org/dist/zookeeper/
242169 No Perforce job exists for this issue. 0 12716
6 years, 24 weeks, 1 day ago 0|i02j87:
ZooKeeper ZOOKEEPER-1540

ZOOKEEPER-1411 breaks backwards compatibility

Bug Resolved Major Fixed Andrew Ferguson Andrew Ferguson Andrew Ferguson 23/Aug/12 13:51   02/Mar/16 20:34 25/Sep/12 01:33 3.5.0 3.5.0     0 5   There is a one-line bug in ZOOKEEPER-1411 which breaks backwards compatibility for sites which are using separate configuration files for each server. The bug is with the handling of the clientPort option.

One line fix to follow shortly.

thanks!
Andrew
242170 No Perforce job exists for this issue. 2 12717
7 years, 26 weeks, 2 days ago 0|i02j8f:
ZooKeeper ZOOKEEPER-1539

Tests in QuorumUtil.startAll() and JMXenv

Bug Open Minor Unresolved Unassigned Alexander Shraer Alexander Shraer 22/Aug/12 00:59   22/Sep/12 21:49       tests   0 1   Consider the following test:

@Test
public void newTest() throws Exception {
QuorumUtil qu = new QuorumUtil(3);
qu.startAll();
}

Although it doesn't seem like we're checking anything at all here, this test actually fails. There is a JMXEnv.ensureAll test invoked from startAll(). It passes for QuorumUtil(1) or QuorumUtil(2) servers but fails for any larger number. Besides the fact that there's a bug in the tests, I think we should call the function differently if we want to invoke tests in it, or alternatively remove these tests or make them optional using some parameter.
242171 No Perforce job exists for this issue. 0 12718
7 years, 31 weeks, 1 day ago 0|i02j8n:
ZooKeeper ZOOKEEPER-1538

Improve space handling in zkServer.sh and zkEnv.sh

Bug Resolved Trivial Fixed Andrew Ferguson Andrew Ferguson Andrew Ferguson 21/Aug/12 20:00   25/Jun/13 14:07 07/Sep/12 02:23 3.4.3 3.5.0     0 6   Running `bin/zkServer.sh start` from a freshly-built copy of trunk fails if the source code is checked-out to a directory with spaces in the name. I'll include a small fix to fix this problem.

thanks!
242172 No Perforce job exists for this issue. 1 12719
6 years, 39 weeks, 2 days ago 0|i02j8v:
ZooKeeper ZOOKEEPER-1537

registration page not accepting capital letters

Bug Resolved Minor Incomplete Unassigned mohammad taher mohammad taher 17/Aug/12 10:20   30/Aug/12 01:39 30/Aug/12 01:39 3.3.5   c client   0 3 1510560 1510560 0% WINDOWS XP, MOZILLA FIREFOX, 500 GB HARD DISK, 2 GB RAM
1.Type zookeeper URL in the address bar to go to home page of it.
2.For new users, click on "new user" and it will open a registration form.
3.Give your full name in capital letters as mentioned.
4.Even though I give capital letters it is not accepting and is giving an error message as "PLEASE TYPE CAPITAL LETTERS"
0% 0% 1510560 1510560 performance 242173 No Perforce job exists for this issue. 0 12720
7 years, 30 weeks ago 0|i02j93:
ZooKeeper ZOOKEEPER-1536

c client : memory leak in winport.c

Bug Resolved Major Fixed brooklin brooklin brooklin 15/Aug/12 23:13   31/Aug/12 07:02 30/Aug/12 16:38 3.4.3 3.4.4, 3.5.0 c client   0 5   windows7 At line 99 in winport.c, use windows API "InitializeCriticalSection" but never call "DeleteCriticalSection" 242174 No Perforce job exists for this issue. 1 12721
7 years, 29 weeks, 6 days ago 0|i02j9b:
ZooKeeper ZOOKEEPER-1535

ZK Shell/Cli re-executes last command on exit

Bug Closed Major Fixed Edward Ribeiro Stu Hood Stu Hood 14/Aug/12 19:56   20/May/17 19:07 30/Dec/12 22:21   3.4.6, 3.5.0 scripts   0 6   zookeeper-3.4.3 release In the ZK 3.4.3 release's version of zkCli.sh, the last command that was executed is *re*-executed when you {{ctrl+d}} out of the shell. In the snippet below, {{ls}} is executed, and then {{ctrl+d}} is triggered (inserted below to illustrate), the output from {{ls}} appears again, due to the command being re-run.
{noformat}
[zk: zookeeper.example.com:2181(CONNECTED) 0] ls /blah
[foo]
[zk: zookeeper.example.com:2181(CONNECTED) 1] <ctrl+d> [foo]
$
{noformat}
cli, shell, zkcli, zkcli.sh 242175 No Perforce job exists for this issue. 2 12722
6 years, 2 weeks ago 0|i02j9j:
ZooKeeper ZOOKEEPER-1534

Zookeeper server do not send Sal authentication failure notification to the client

Bug Open Major Unresolved Unassigned Tally Tsabary Tally Tsabary 13/Aug/12 04:55   14/Feb/18 15:46   3.4.3   server   0 5   Windows 7. Zookeeper 3.4.3 Curator 1.1.15 Java 1.6 Server side: zookeeper 3.4.3 with patch ZOOKEEPER-1437.patch 22/Jun/12 00:24
Client side: java, Curator 1.1.15, zookeeper 3.4.3 with patch ZOOKEEPER-1437.patch 22/Jun/12 00:24

Environment configured to use Sasl authentication.
While the authenticatiion is successful, everything works fine.
In case of authentication failue, it seems that the zk server catch the SaslException and close the socket without sending any additional notification to the client, so despite the client has an implementation to handle Sasl authentication failure, it is never used…

Details:
=========


zk server log:
{noformat}
2012-08-10 11:00:46,730 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /127.0.0.1:50208
2012-08-10 11:00:46,731 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@780] - Session establishment request from client /127.0.0.1:50208 client's lastZxid is 0x0
2012-08-10 11:00:46,731 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establish new session at /127.0.0.1:50208
2012-08-10 11:00:46,733 [myid:] - DEBUG [SyncThread:0:FinalRequestProcessor@88] - Processing request:: sessionid:0x1390fd2ee630004 type:createSession cxid:0x0 zxid:0x26b txntype:-10 reqpath:n/a
2012-08-10 11:00:46,733 [myid:] - DEBUG [SyncThread:0:FinalRequestProcessor@160] - sessionid:0x1390fd2ee630004 type:createSession cxid:0x0 zxid:0x26b txntype:-10 reqpath:n/a
2012-08-10 11:00:46,734 [myid:] - INFO [SyncThread:0:ZooKeeperServer@604] - Established session 0x1390fd2ee630004 with negotiated timeout 40000 for client /127.0.0.1:50208
2012-08-10 11:00:46,736 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@919] - Responding to client SASL token.
2012-08-10 11:00:46,736 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@923] - Size of client SASL token: 0
2012-08-10 11:00:46,736 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@954] - Size of server SASL response: 101
2012-08-10 11:00:46,740 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@919] - Responding to client SASL token.
2012-08-10 11:00:46,741 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@923] - Size of client SASL token: 272
2012-08-10 11:00:46,741 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@106] - client supplied realm: zk-sasl-md5
2012-08-10 11:00:46,741 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@939] - Client failed to SASL authenticate: javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response.
2012-08-10 11:00:46,742 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@945] - Closing client connection due to SASL authentication failure.
2012-08-10 11:00:46,742 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:50208 which had sessionid 0x1390fd2ee630004
2012-08-10 11:00:46,743 [myid:] - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@180] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1075)
at org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:906)
at org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:365)
at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:202)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
at java.lang.Thread.run(Thread.java:662)
{noformat}

At the corresponding source: org.apache.zookeeper.server.ZooKeeperServer

{noformat}
private Record processSasl(ByteBuffer incomingBuffer, ServerCnxn cnxn) throws IOException {
LOG.debug("Responding to client SASL token.");
GetSASLRequest clientTokenRecord = new GetSASLRequest();
ByteBufferInputStream.byteBuffer2Record(incomingBuffer,clientTokenRecord);
byte[] clientToken = clientTokenRecord.getToken();
LOG.debug("Size of client SASL token: " + clientToken.length);
byte[] responseToken = null;
try {
ZooKeeperSaslServer saslServer = cnxn.zooKeeperSaslServer;
try {
// note that clientToken might be empty (clientToken.length == 0):
// if using the DIGEST-MD5 mechanism, clientToken will be empty at the beginning of the
// SASL negotiation process.
responseToken = saslServer.evaluateResponse(clientToken);
if (saslServer.isComplete() == true) {
String authorizationID = saslServer.getAuthorizationID();
LOG.info("adding SASL authorization for authorizationID: " + authorizationID);
cnxn.addAuthInfo(new Id("sasl",authorizationID));
}
}
catch (SaslException e) {
LOG.warn("Client failed to SASL authenticate: " + e);
if ((System.getProperty("zookeeper.allowSaslFailedClients") != null)
&&
(System.getProperty("zookeeper.allowSaslFailedClients").equals("true"))) {
LOG.warn("Maintaining client connection despite SASL authentication failure.");
} else {
LOG.warn("Closing client connection due to SASL authentication failure.");
cnxn.close(); Tally: at this stage the socket is closed without sending any notification to the client
}
}
}
catch (NullPointerException e) {
LOG.error("cnxn.saslServer is null: cnxn object did not initialize its saslServer properly.");
}
if (responseToken != null) {
LOG.debug("Size of server SASL response: " + responseToken.length);
}
// wrap SASL response token to client inside a Response object.
return new SetSASLResponse(responseToken);
}
{noformat}


The client log shows that the client identified the socket closer and just retry to connect as if the zk server just went down..
{noformat}
[10-Aug-2012 11:00:44.558 IST] INFO <org.apache.zookeeper.ClientCnxn$SendThread> Opening socket connection to server 127.0.0.1/127.0.0.1:2181
[10-Aug-2012 11:00:44.559 IST] INFO <org.apache.zookeeper.client.ZooKeeperSaslClient> Found Login Context section 'Client': will use it to attempt to SASL-authenticate.
[10-Aug-2012 11:00:44.560 IST] INFO <org.apache.zookeeper.client.ZooKeeperSaslClient> Client will use DIGEST-MD5 as SASL mechanism.
[10-Aug-2012 11:00:44.561 IST] INFO <org.apache.zookeeper.ClientCnxn$SendThread> Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session
[10-Aug-2012 11:00:44.563 IST] DEBUG <org.apache.zookeeper.ClientCnxn$SendThread> Session establishment request sent on 127.0.0.1/127.0.0.1:2181
[10-Aug-2012 11:00:44.564 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.566 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.568 IST] INFO <org.apache.zookeeper.ClientCnxn$SendThread> Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x1390fd2ee630003, negotiated timeout = 40000
[10-Aug-2012 11:00:44.569 IST] INFO <com.netflix.curator.framework.state.ConnectionStateManager> State change: RECONNECTED
[10-Aug-2012 11:00:44.569 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.572 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.574 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.576 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.578 IST] DEBUG <org.apache.zookeeper.client.ZooKeeperSaslClient> ClientCnxn:sendSaslPacket:length=0
[10-Aug-2012 11:00:44.579 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.581 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.583 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.585 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.587 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.589 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.591 IST] DEBUG <org.apache.zookeeper.client.ZooKeeperSaslClient$2> saslClient.evaluateChallenge(len=101)
[10-Aug-2012 11:00:44.592 IST] DEBUG <org.apache.zookeeper.client.ZooKeeperSaslClient> ClientCnxn:sendSaslPacket:length=272
[10-Aug-2012 11:00:44.593 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.596 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.598 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes.
[10-Aug-2012 11:00:44.600 IST] INFO <org.apache.zookeeper.ClientCnxn$SendThread> Unable to read additional data from server sessionid 0x1390fd2ee630003, likely server has closed socket, closing socket connection and attempting reconnect
[10-Aug-2012 11:00:44.701 IST] ERROR <com.netflix.curator.framework.imps.CuratorFrameworkImpl> Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:438)
at com.netflix.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:49)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:606)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
[10-Aug-2012 11:00:44.706 IST] INFO <com.netflix.curator.framework.state.ConnectionStateManager> State change: LOST
[10-Aug-2012 11:00:44.708 IST] WARN <com.netflix.curator.framework.state.ConnectionStateManager> ConnectionStateManager queue full - dropping events to make room
[10-Aug-2012 11:00:44.710 IST] INFO <com.netflix.curator.framework.state.ConnectionStateManager> State change: SUSPENDED
{noformat}
242176 No Perforce job exists for this issue. 0 12723
2 years, 5 weeks, 1 day ago 0|i02j9r:
ZooKeeper ZOOKEEPER-1533

Correct the documentation of the args for the JavaExample doc.

Bug Resolved Minor Fixed Warren Turkal Warren Turkal Warren Turkal 13/Aug/12 02:50   02/Mar/16 20:35 14/Aug/12 19:11 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3, 3.3.5, 3.3.6, 3.4.4, 3.5.0 3.5.0 documentation   0 4   Small doc fix in the JavaExample doc. 242177 No Perforce job exists for this issue. 1 12724
7 years, 32 weeks, 1 day ago 0|i02j9z:
ZooKeeper ZOOKEEPER-1532

Correct the documentation of the args for the JavaExample doc.

Improvement Resolved Major Invalid Unassigned Warren Turkal Warren Turkal 09/Aug/12 19:13   09/May/14 17:44 09/May/14 17:44   3.5.0     0 1   Small doc fix. 242178 No Perforce job exists for this issue. 0 12725
5 years, 47 weeks, 6 days ago 0|i02ja7:
ZooKeeper ZOOKEEPER-1531

Correct the documentation of the args for the JavaExample doc.

Bug Resolved Major Duplicate Unassigned Warren Turkal Warren Turkal 09/Aug/12 19:13   03/Sep/13 02:53 03/Sep/13 02:53   3.5.0     0 1   Small doc fix. 242179 No Perforce job exists for this issue. 0 12726
6 years, 29 weeks, 3 days ago 0|i02jaf:
ZooKeeper ZOOKEEPER-1530

Correct the documentation of the args for the JavaExample doc.

Bug Resolved Major Duplicate Unassigned Warren Turkal Warren Turkal 09/Aug/12 19:13   03/Sep/13 02:54 03/Sep/13 02:54         0 0   Small doc fix. 242180 No Perforce job exists for this issue. 0 12727
7 years, 33 weeks ago 0|i02jan:
ZooKeeper ZOOKEEPER-1529

Correct the documentation of the args for the JavaExample doc.

Bug Resolved Minor Duplicate Unassigned Warren Turkal Warren Turkal 09/Aug/12 16:51   03/Sep/13 02:54 03/Sep/13 02:54         0 0   Correct the documentation of the args for the JavaExample doc. 242181 No Perforce job exists for this issue. 0 12728
7 years, 33 weeks ago 0|i02jav:
ZooKeeper ZOOKEEPER-1528

Correct the documentation of the args for the JavaExample doc.

Bug Resolved Minor Duplicate Unassigned Warren Turkal Warren Turkal 09/Aug/12 16:50   03/Sep/13 02:55 03/Sep/13 02:55         0 0   I added another listitem documenting the filename arg of the JavaExample code. 242182 No Perforce job exists for this issue. 0 12729
7 years, 33 weeks ago 0|i02jb3:
ZooKeeper ZOOKEEPER-1527

Correct the documentation of the args for the JavaExample doc

Bug Resolved Trivial Duplicate Unassigned Warren Turkal Warren Turkal 09/Aug/12 16:50   11/Oct/13 12:39 11/Oct/13 12:39         0 0   I added another listitem documenting the filename arg of the JavaExample code. 242183 No Perforce job exists for this issue. 0 12730
6 years, 23 weeks, 6 days ago 0|i02jbb:
ZooKeeper ZOOKEEPER-1526

Correct the documentation of the args for the JavaExample doc

Bug Open Trivial Unresolved Unassigned Warren Turkal Warren Turkal 09/Aug/12 16:50   09/Aug/12 16:50           0 0   I added another listitem documenting the filename arg of the JavaExample code. 242184 No Perforce job exists for this issue. 0 12731
7 years, 33 weeks ago 0|i02jbj:
ZooKeeper ZOOKEEPER-1525

Plumb ZooKeeperServer object into auth plugins

Improvement Resolved Major Fixed Jordan Zimmerman Warren Turkal Warren Turkal 02/Aug/12 19:35   21/Nov/16 17:43 17/Nov/16 11:20 3.5.0 3.6.0     7 11   ZOOKEEPER-2143 I want to plumb the ZooKeeperServer object into the auth plugins so that I can store authentication data in zookeeper itself. With access to the ZooKeeperServer object, I also have access to the ZKDatabase and can look up entries in the local copy of the zookeeper data.

In order to implement this, I make sure that a ZooKeeperServer instance is passed in to the ProviderRegistry.initialize() method. Then initialize() will try to find a constructor for the AuthenticationProvider that takes a ZooKeeperServer instance. If the constructor is found, it will be used. Otherwise, initialize() will look for a constructor that takes no arguments and use that instead.
239687 No Perforce job exists for this issue. 12 2543
3 years, 17 weeks, 3 days ago Plumb ZooKeeperServer object into auth plugins. 0|i00sgn:
ZooKeeper ZOOKEEPER-1524

use more standard junit annotation "@Before" in SaslXTests rather than static blocks

Improvement Open Minor Unresolved Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 01/Aug/12 15:39   01/Aug/12 15:56       tests   0 1   The following tests:

AuthTest.java
SaslAuthFailTest.java
SaslAuthDesignatedClientTest.java
SaslAuthFailDesignatedClientTest.java
SaslAuthMissingClientConfigTest.java
SaslAuthTest.java

use "static {..}" blocks to initialize system properties and files prior to the test runs. As Patrick points out in ZOOKEEPER-1503, we should instead use JUnit's @Before annotation:


http://junit.sourceforge.net/javadoc/org/junit/Before.html

rather than static blocks, to make our tests more consistent and easier to understand.
242185 No Perforce job exists for this issue. 0 12732
7 years, 34 weeks, 1 day ago 0|i02jbr:
ZooKeeper ZOOKEEPER-1523

Better logging during instance loading/syncing

Improvement Open Critical Unresolved Unassigned Jordan Zimmerman Jordan Zimmerman 31/Jul/12 17:26   09/Jul/19 16:00   3.3.5   quorum, server   0 6 0 9000   When an instance is coming up and loading from snapshot, better logging is needed so an operator knows how long until completion. Also, when syncing with the leader, better logging is needed to know how long until success. 100% 100% 9000 0 pull-request-available 242186 No Perforce job exists for this issue. 0 12733
36 weeks, 2 days ago 0|i02jbz:
ZooKeeper ZOOKEEPER-1522

intermittent failures in Zab test due to NPE in recursiveDelete test function

Bug Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 30/Jul/12 18:27   01/Aug/12 17:12 01/Aug/12 11:57 3.4.3, 3.5.0 3.4.4, 3.5.0 tests   0 3   The jdk7 test job on jenkins is failing intermittently with

{noformat}
java.lang.NullPointerException
at org.apache.zookeeper.server.quorum.Zab1_0Test.recursiveDelete(Zab1_0Test.java:917)
at org.apache.zookeeper.server.quorum.Zab1_0Test.recursiveDelete(Zab1_0Test.java:918)
at org.apache.zookeeper.server.quorum.Zab1_0Test.recursiveDelete(Zab1_0Test.java:918)
at org.apache.zookeeper.server.quorum.Zab1_0Test.testPopulatedLeaderConversation(Zab1_0Test.java:419)
at org.apache.zookeeper.server.quorum.Zab1_0Test.testUnnecessarySnap(Zab1_0Test.java:483)
{noformat}

Seems to not be handling the case where the file is deleted out from under. Also the recursive deletes should be at the very end of the finally I would think.
242016 No Perforce job exists for this issue. 1 12501
7 years, 34 weeks, 1 day ago
Reviewed
0|i02hwf:
ZooKeeper ZOOKEEPER-1521

LearnerHandler initLimit/syncLimit problems specifying follower socket timeout limits

Bug Resolved Critical Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 26/Jul/12 11:44   29/Jul/12 07:02 29/Jul/12 01:08 3.4.3, 3.3.5, 3.5.0 3.3.6, 3.4.4, 3.5.0 server   0 9   branch 3.3: The leader is expecting the follower to initialize in syncLimit time rather than initLimit. In LearnerHandler run line 395 (branch33) we look for the ack from the follower with a timeout of syncLimit.

branch 3.4+: seems like ZOOKEEPER-1136 introduced a regression while attempting to fix the problem. It sets the timeout as initLimit however it never sets the timeout to syncLimit once the ack is received.
242017 No Perforce job exists for this issue. 3 12504
7 years, 34 weeks, 4 days ago
Reviewed
0|i02hx3:
ZooKeeper ZOOKEEPER-1520

A txn log record with a corrupt sentinel byte looks like EOF

Bug Open Minor Unresolved Bill Bridge Bill Bridge Bill Bridge 25/Jul/12 14:42   05/Feb/20 07:16   3.3.5 3.7.0, 3.5.8 server   1 4 86400 86400 0% all In Util.readTxnBytes() the sentinel is compared with 0x42 and if it does not match then the record is considered partially written and thus the EOF. However if it is a partial record the sentinel should be 0x00 since that is what the log is initialized with. Any other value would indicate corruption and should throw an IOException rather than indicate EOF. See [ZOOKEEPER-1453|https://issues.apache.org/jira/browse/ZOOKEEPER-1453] for a related issue. 0% 0% 86400 86400 newbie, patch 239662 No Perforce job exists for this issue. 5 2495
5 years, 51 weeks, 2 days ago 0|i00s5z:
ZooKeeper ZOOKEEPER-1519

Zookeeper Async calls can reference free()'d memory

Bug Open Major Unresolved Daniel Lescohier Mark Gius Mark Gius 25/Jul/12 13:34   05/Feb/20 07:16   3.3.3, 3.3.6 3.7.0, 3.5.8 c client   0 8   Ubuntu 11.10, Ubuntu packaged Zookeeper 3.3.3 with some backported fixes. zoo_acreate() and zoo_aset() take a char * argument for data and prepare a call to zookeeper. This char * doesn't seem to be duplicated at any point, making it possible that the caller of the asynchronous function might potentially free() the char * argument before the zookeeper library completes its request. This is unlikely to present a real problem unless the freed memory is re-used before zookeeper consumes it. I've been unable to reproduce this issue using pure C as a result.

However, ZKPython is a whole different story. Consider this snippet:

ok = zookeeper.acreate(handle, path, json.dumps(value),
acl, flags, callback)
assert ok == zookeeper.OK

In this snippet, json.dumps() allocates a string which is passed into the acreate(). When acreate() returns, the zookeeper request has been constructed with a pointer to the string allocated by json.dumps(). Also when acreate() returns, that string is now referenced by 0 things (ZKPython doesn't bump the refcount) and the string is eligible for garbage collection and re-use. The Zookeeper request now has a pointer to dangerous freed memory.

I've been seeing odd behavior in our development environments for some time now, where it appeared as though two separate JSON payloads had been joined together. Python has been allocating a new JSON string in the middle of the old string that an incomplete zookeeper async call had not yet processed.

I am not sure if this is a behavior that should be documented, or if the C binding implementation needs to be updated to create copies of the data payload provided for aset and acreate.
242187 No Perforce job exists for this issue. 1 12734
6 years, 19 weeks ago 0|i02jc7:
ZooKeeper ZOOKEEPER-1518

Mailing List link is broken in the Zookeeper documentation

Bug Resolved Major Fixed Patrick D. Hunt Kiran BC Kiran BC 25/Jul/12 05:28   01/Aug/12 15:09 01/Aug/12 15:09 3.4.3   documentation   0 2   Mailing List link under Miscellaneous section from the Zookeeper documentation is broken.
Following is the link:
http://zookeeper.apache.org/mailing_lists.html
242188 No Perforce job exists for this issue. 0 12735
7 years, 34 weeks, 1 day ago 0|i02jcf:
ZooKeeper ZOOKEEPER-1517

zookeeper follower closed

Bug Resolved Major Invalid Unassigned liuli liuli 24/Jul/12 08:52   02/Aug/12 20:30 02/Aug/12 20:17 3.3.5 3.3.5 quorum   0 1   zookeeper version 3.3.5
Hadoop version 0.20.205.0
I have Hadoop and Zookeeper installed

the zoo.cfg is :

tickTime=2000
dataDir=/home/hduser/zookeeper/conf
clientPort=2181
initLimit=10
syncLimit=5
server.1=rsmm-master:2888:3888
server.2=rsmm-slave-1:2888:3888
server.3=rsmm-slave-2:2888:3888
server.4=rsmm-slave-3:2888:3888
server.5=rsmm-slave-4:2888:3888

=====================================
I tried to start zookeeper,
./zkServer.sh start
./zkServer.sh status

JMX enabled by default
Using config: /home/hduser/zookeeper/bin/../conf/zoo.cfg
Mode: follower



The follower (rsmm-slave-4) logs complain:

012-07-24 20:29:35,903 - WARN [Thread-9:QuorumCnxManager$RecvWorker@727] - Connection broken for id 5, my id = 2, error = java.io.IOException: Channel eof
2012-07-24 20:29:35,904 - WARN [Thread-9:QuorumCnxManager$RecvWorker@730] - Interrupting SendWorker
2012-07-24 20:29:35,905 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
2012-07-24 20:29:35,905 - WARN [Thread-8:QuorumCnxManager$SendWorker@633] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2094)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:370)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622)
2012-07-24 20:29:35,907 - WARN [Thread-8:QuorumCnxManager$SendWorker@642] - Send worker leaving thread
2012-07-24 20:29:35,907 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@165] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
2012-07-24 20:29:35,913 - INFO [FollowerRequestProcessor:2:FollowerRequestProcessor@93] - FollowerRequestProcessor exited loop!
2012-07-24 20:29:35,914 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@370] - shutdown of request processor complete
2012-07-24 20:29:35,914 - INFO [CommitProcessor:2:CommitProcessor@148] - CommitProcessor exited loop!
2012-07-24 20:29:35,915 - INFO [SyncThread:2:SyncRequestProcessor@151] - SyncRequestProcessor exited!
2012-07-24 20:29:35,916 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), FOLLOWING (my state)
2012-07-24 20:29:35,916 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@621] - LOOKING
2012-07-24 20:29:35,918 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000
2012-07-24 20:29:35,919 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@663] - New election. My id = 2, Proposed zxid = 4294967296
2012-07-24 20:29:35,919 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,920 - WARN [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 5 at election address rsmm-slave-4/109.123.121.27:3888
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
at java.lang.Thread.run(Thread.java:679)
2012-07-24 20:29:35,920 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,922 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,926 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,928 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,932 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,936 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:29:36,137 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@655] - LEADING
2012-07-24 20:29:36,141 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Leader@55] - TCP NoDelay set to: true
2012-07-24 20:29:36,143 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@154] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/hduser/zookeeper/conf/version-2 snapdir /home/hduser/zookeeper/conf/version-2
2012-07-24 20:29:36,147 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000
2012-07-24 20:29:36,148 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@254] - Snapshotting: 100000000
2012-07-24 20:29:37,149 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@249] - Follower sid: 4 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1c74f37
2012-07-24 20:29:37,150 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@273] - Synchronizing with Follower sid: 4 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0
2012-07-24 20:29:37,151 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x100000000
2012-07-24 20:29:37,152 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@249] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@a17083
2012-07-24 20:29:37,153 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@273] - Synchronizing with Follower sid: 1 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 100000000
2012-07-24 20:29:37,154 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x100000000 zxid of leader is 0x200000000sent zxid of db as 0x100000000
2012-07-24 20:29:37,156 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@249] - Follower sid: 3 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@16fe0f4
2012-07-24 20:29:37,156 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@273] - Synchronizing with Follower sid: 3 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0
2012-07-24 20:29:37,157 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x100000000
2012-07-24 20:29:37,159 - WARN [LearnerHandler-/109.123.121.26:34087:Leader@492] - Commiting zxid 0x200000000 from /109.123.121.24:2888 not first!
2012-07-24 20:29:37,160 - WARN [LearnerHandler-/109.123.121.26:34087:Leader@494] - First is 0
2012-07-24 20:29:37,172 - INFO [LearnerHandler-/109.123.121.26:34087:Leader@518] - Have quorum of supporters; starting up and setting last processed zxid: 8589934592
2012-07-24 20:30:40,397 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state)
2012-07-24 20:30:40,397 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state)
2012-07-24 20:30:40,398 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state)
2012-07-24 20:30:40,400 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state)
2012-07-24 20:30:40,641 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@249] - Follower sid: 5 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@15663a2
2012-07-24 20:30:40,642 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@273] - Synchronizing with Follower sid: 5 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0
2012-07-24 20:30:40,642 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x200000000

2012-07-24 20:30:37,768 - INFO [main:QuorumPeerConfig@90] - Reading configuration from: /home/hduser/zookeeper/bin/../conf/zoo.cfg
2012-07-24 20:30:37,774 - INFO [main:QuorumPeerConfig@310] - Defaulting to majority quorums
2012-07-24 20:30:37,792 - INFO [main:QuorumPeerMain@119] - Starting quorum peer
2012-07-24 20:30:37,820 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181
2012-07-24 20:30:37,845 - INFO [main:QuorumPeer@819] - tickTime set to 2000
2012-07-24 20:30:37,845 - INFO [main:QuorumPeer@830] - minSessionTimeout set to -1
2012-07-24 20:30:37,846 - INFO [main:QuorumPeer@841] - maxSessionTimeout set to -1
2012-07-24 20:30:37,846 - INFO [main:QuorumPeer@856] - initLimit set to 10
2012-07-24 20:30:37,863 - INFO [main:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.0
2012-07-24 20:30:37,895 - INFO [Thread-1:QuorumCnxManager$Listener@473] - My election bind port: 3888
2012-07-24 20:30:37,909 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@621] - LOOKING
2012-07-24 20:30:37,912 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@663] - New election. My id = 5, Proposed zxid = 0
2012-07-24 20:30:37,923 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,923 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,924 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,924 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,925 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@721] - Updating proposal
2012-07-24 20:30:37,928 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 5 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,929 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,929 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,931 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 1 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,932 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,932 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LEADING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,933 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,933 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,934 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,935 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,935 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,936 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,937 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LEADING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,937 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,938 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,938 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LEADING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,938 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,939 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,939 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LEADING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,939 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,940 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,942 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,942 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:30:37,942 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:30:38,143 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@643] - FOLLOWING
2012-07-24 20:30:38,150 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@80] - TCP NoDelay set to: true
2012-07-24 20:30:38,157 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT
2012-07-24 20:30:38,157 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:host.name=rsmm-slave-4
2012-07-24 20:30:38,158 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.version=1.6.0_23
2012-07-24 20:30:38,158 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.vendor=Sun Microsystems Inc.
2012-07-24 20:30:38,158 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
2012-07-24 20:30:38,159 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.class.path=/home/hduser/zookeeper/bin/../build/classes:/home/hduser/zookeeper/bin/../build/lib/*.jar:/home/hduser/zookeeper/bin/../zookeeper-3.3.5.jar:/home/hduser/zookeeper/bin/../lib/log4j-1.2.15.jar:/home/hduser/zookeeper/bin/../lib/jline-0.9.94.jar:/home/hduser/zookeeper/bin/../src/java/lib/*.jar:/home/hduser/zookeeper/bin/../conf:
2012-07-24 20:30:38,159 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.library.path=/usr/lib/jvm/java-6-openjdk/jre/lib/i386/client:/usr/lib/jvm/java-6-openjdk/jre/lib/i386:/usr/lib/jvm/java-6-openjdk/jre/../lib/i386:/usr/java/packages/lib/i386:/usr/lib/jni:/lib:/usr/lib
2012-07-24 20:30:38,159 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.io.tmpdir=/tmp
2012-07-24 20:30:38,159 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.compiler=<NA>
2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.name=Linux
2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.arch=i386
2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.version=3.0.0-12-generic
2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.name=hduser
2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.home=/home/hduser
2012-07-24 20:30:38,161 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.dir=/home/hduser/zookeeper/bin
2012-07-24 20:30:38,162 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@154] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/hduser/zookeeper/conf/version-2 snapdir /home/hduser/zookeeper/conf/version-2
2012-07-24 20:30:38,175 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@294] - Getting a snapshot from leader
2012-07-24 20:30:38,179 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@326] - Setting leader epoch 2
2012-07-24 20:30:38,180 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@254] - Snapshotting: 200000000
2012-07-24 20:30:46,564 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41116
2012-07-24 20:30:46,569 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41116
2012-07-24 20:30:46,573 - INFO [Thread-10:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41116 (no session established for client)
2012-07-24 20:33:27,407 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41118
2012-07-24 20:33:27,408 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41118
2012-07-24 20:33:27,411 - INFO [Thread-11:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41118 (no session established for client)
2012-07-24 20:47:21,659 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41126
2012-07-24 20:47:21,660 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41126
2012-07-24 20:47:21,663 - INFO [Thread-12:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41126 (no session established for client)
==================================

while the leader 's log shows

2012-07-24 20:22:33,769 - INFO [main:QuorumPeerConfig@90] - Reading configuration from: /home/hduser/zookeeper/bin/../conf/zoo.cfg
2012-07-24 20:22:33,776 - INFO [main:QuorumPeerConfig@310] - Defaulting to majority quorums
2012-07-24 20:22:33,795 - INFO [main:QuorumPeerMain@119] - Starting quorum peer
2012-07-24 20:22:33,827 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181
2012-07-24 20:22:33,854 - INFO [main:QuorumPeer@819] - tickTime set to 2000
2012-07-24 20:22:33,854 - INFO [main:QuorumPeer@830] - minSessionTimeout set to -1
2012-07-24 20:22:33,855 - INFO [main:QuorumPeer@841] - maxSessionTimeout set to -1
2012-07-24 20:22:33,855 - INFO [main:QuorumPeer@856] - initLimit set to 10
2012-07-24 20:22:33,874 - INFO [main:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000
2012-07-24 20:22:33,905 - INFO [Thread-1:QuorumCnxManager$Listener@473] - My election bind port: 3888
2012-07-24 20:22:33,923 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@621] - LOOKING
2012-07-24 20:22:33,926 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@663] - New election. My id = 2, Proposed zxid = 4294967296
2012-07-24 20:22:33,935 - INFO [WorkerSender Thread:QuorumCnxManager@183] - Have smaller server identifier, so dropping the connection: (3, 2)
2012-07-24 20:22:33,935 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 1 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-24 20:22:33,936 - INFO [WorkerSender Thread:QuorumCnxManager@183] - Have smaller server identifier, so dropping the connection: (4, 2)
2012-07-24 20:22:33,937 - INFO [WorkerSender Thread:QuorumCnxManager@183] - Have smaller server identifier, so dropping the connection: (5, 2)
2012-07-24 20:22:33,938 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:22:33,939 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:22:33,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:22:33,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:22:33,942 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:22:33,943 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:22:33,945 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LEADING (n.state), 5 (n.sid), LOOKING (my state)
2012-07-24 20:22:33,945 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LEADING (n.state), 5 (n.sid), FOLLOWING (my state)
2012-07-24 20:22:33,946 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@643] - FOLLOWING
2012-07-24 20:22:33,952 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@80] - TCP NoDelay set to: true
2012-07-24 20:22:33,959 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT
2012-07-24 20:22:33,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:host.name=rsmm-slave-1
2012-07-24 20:22:33,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.version=1.6.0_23
2012-07-24 20:22:33,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.vendor=Sun Microsystems Inc.
2012-07-24 20:22:33,961 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
2012-07-24 20:22:33,961 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.class.path=/home/hduser/zookeeper/bin/../build/classes:/home/hduser/zookeeper/bin/../build/lib/*.jar:/home/hduser/zookeeper/bin/../zookeeper-3.3.5.jar:/home/hduser/zookeeper/bin/../lib/log4j-1.2.15.jar:/home/hduser/zookeeper/bin/../lib/jline-0.9.94.jar:/home/hduser/zookeeper/bin/../src/java/lib/*.jar:/home/hduser/zookeeper/bin/../conf:
2012-07-24 20:22:33,961 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.library.path=/usr/lib/jvm/java-6-openjdk/jre/lib/i386/client:/usr/lib/jvm/java-6-openjdk/jre/lib/i386:/usr/lib/jvm/java-6-openjdk/jre/../lib/i386:/usr/java/packages/lib/i386:/usr/lib/jni:/lib:/usr/lib
2012-07-24 20:22:33,961 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.io.tmpdir=/tmp
2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.compiler=<NA>
2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.name=Linux
2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.arch=i386
2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.version=3.0.0-12-generic
2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.name=hduser
2012-07-24 20:22:33,963 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.home=/home/hduser
2012-07-24 20:22:33,963 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.dir=/home/hduser/zookeeper/bin
2012-07-24 20:22:33,965 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@154] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/hduser/zookeeper/conf/version-2 snapdir /home/hduser/zookeeper/conf/version-2
2012-07-24 20:22:33,977 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@291] - Getting a diff from the leader 0x100000000
2012-07-24 20:22:33,981 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@326] - Setting leader epoch 1
2012-07-24 20:22:33,983 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@254] - Snapshotting: 100000000
2012-07-24 20:22:40,102 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41400
2012-07-24 20:22:40,106 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41400
2012-07-24 20:22:40,109 - INFO [Thread-10:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41400 (no session established for client)
2012-07-24 20:29:35,903 - WARN [Thread-9:QuorumCnxManager$RecvWorker@727] - Connection broken for id 5, my id = 2, error = java.io.IOException: Channel eof
2012-07-24 20:29:35,904 - WARN [Thread-9:QuorumCnxManager$RecvWorker@730] - Interrupting SendWorker
2012-07-24 20:29:35,905 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
2012-07-24 20:29:35,905 - WARN [Thread-8:QuorumCnxManager$SendWorker@633] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2094)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:370)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622)
2012-07-24 20:29:35,907 - WARN [Thread-8:QuorumCnxManager$SendWorker@642] - Send worker leaving thread
2012-07-24 20:29:35,907 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@165] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
2012-07-24 20:29:35,913 - INFO [FollowerRequestProcessor:2:FollowerRequestProcessor@93] - FollowerRequestProcessor exited loop!
2012-07-24 20:29:35,914 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@370] - shutdown of request processor complete
2012-07-24 20:29:35,914 - INFO [CommitProcessor:2:CommitProcessor@148] - CommitProcessor exited loop!
2012-07-24 20:29:35,915 - INFO [SyncThread:2:SyncRequestProcessor@151] - SyncRequestProcessor exited!
2012-07-24 20:29:35,916 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), FOLLOWING (my state)
2012-07-24 20:29:35,916 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@621] - LOOKING
2012-07-24 20:29:35,918 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000
2012-07-24 20:29:35,919 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@663] - New election. My id = 2, Proposed zxid = 4294967296
2012-07-24 20:29:35,919 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,920 - WARN [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 5 at election address rsmm-slave-4/109.123.121.27:3888
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
at java.lang.Thread.run(Thread.java:679)
2012-07-24 20:29:35,920 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,922 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,926 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,928 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,932 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-24 20:29:35,936 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-24 20:29:36,137 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@655] - LEADING
2012-07-24 20:29:36,141 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Leader@55] - TCP NoDelay set to: true
2012-07-24 20:29:36,143 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@154] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/hduser/zookeeper/conf/version-2 snapdir /home/hduser/zookeeper/conf/version-2
2012-07-24 20:29:36,147 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000
2012-07-24 20:29:36,148 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@254] - Snapshotting: 100000000
2012-07-24 20:29:37,149 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@249] - Follower sid: 4 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1c74f37
2012-07-24 20:29:37,150 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@273] - Synchronizing with Follower sid: 4 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0
2012-07-24 20:29:37,151 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x100000000
2012-07-24 20:29:37,152 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@249] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@a17083
2012-07-24 20:29:37,153 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@273] - Synchronizing with Follower sid: 1 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 100000000
2012-07-24 20:29:37,154 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x100000000 zxid of leader is 0x200000000sent zxid of db as 0x100000000
2012-07-24 20:29:37,156 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@249] - Follower sid: 3 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@16fe0f4
2012-07-24 20:29:37,156 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@273] - Synchronizing with Follower sid: 3 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0
2012-07-24 20:29:37,157 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x100000000
2012-07-24 20:29:37,159 - WARN [LearnerHandler-/109.123.121.26:34087:Leader@492] - Commiting zxid 0x200000000 from /109.123.121.24:2888 not first!
2012-07-24 20:29:37,160 - WARN [LearnerHandler-/109.123.121.26:34087:Leader@494] - First is 0
2012-07-24 20:29:37,172 - INFO [LearnerHandler-/109.123.121.26:34087:Leader@518] - Have quorum of supporters; starting up and setting last processed zxid: 8589934592
2012-07-24 20:30:40,397 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state)
2012-07-24 20:30:40,397 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state)
2012-07-24 20:30:40,398 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state)
2012-07-24 20:30:40,400 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state)
2012-07-24 20:30:40,641 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@249] - Follower sid: 5 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@15663a2
2012-07-24 20:30:40,642 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@273] - Synchronizing with Follower sid: 5 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0
2012-07-24 20:30:40,642 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x200000000
2012-07-24 20:49:19,788 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41403
2012-07-24 20:49:19,789 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41403
2012-07-24 20:49:19,791 - INFO [Thread-18:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41403 (no session established for client)
242189 No Perforce job exists for this issue. 0 12736
7 years, 34 weeks ago I will close this one since zookeeper start normally 0|i02jcn:
ZooKeeper ZOOKEEPER-1516

Configurable finalizeWait for FastLeaderElection

Improvement Open Major Unresolved Unassigned Ivan Babrou Ivan Babrou 24/Jul/12 07:21   25/Jul/12 01:50   3.3.5   leaderElection, quorum, server   1 3   Gentoo linux, any environment is affected. FastLeaderElection has final static int finalizeWait = 200. This is time to wait after successful leader election. I don't know what could happen, but 200ms is too slow for production environment under heavy load.

I changed it to 20ms and everything still works for me.

I propose to make this value configurable with default value of 200 to not affect current installations.

Combined with #ZOOKEEPER-1515 it could improve leader election and make it 10x times faster: 1500ms -> 180ms including 100ms for 2 faileed new leader connections.
performance 242190 No Perforce job exists for this issue. 0 12737
7 years, 35 weeks, 1 day ago 0|i02jcv:
ZooKeeper ZOOKEEPER-1515

Long reconnect timeout if leader failed.

Improvement Open Major Unresolved Unassigned Ivan Babrou Ivan Babrou 24/Jul/12 01:58   25/Jul/12 01:45   3.3.5   leaderElection, quorum, server   1 4   Gentoo linux, but every environment is affected. In zookeeper 3.3.5 in file src/java/main/org/apache/zookeeper/server/quorum/Learner.java:325 you may see Thread.sleep(1000);

This is always happens after leader failure or restart. Zookeeper reelects new leader and all followers try to connect to it. But first attempt always fails because of "Connection refused":

{quote}
2012-07-23 18:55:48,159 - WARN [QuorumPeer:/0.0.0.0:2181:Learner@229] - Unexpected exception, tries=0, connecting to web329.local/192.168.1.74:2888
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:221)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:65)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
{quote}

I propose to change this line to the next code:

{code:title=Learner.java|borderStyle=solid}
if (tries > 0) {
Thread.sleep(self.tickTime);
}
{code}

This way first reconnect attempt will be done immediately, other will wait for tick time (this is good semantic change, I suppose).

The result of this change - leader reelection time lowered from >1500ms to 300-400ms with 50ms tick time. This is pretty important for our production environment and will not break any existing installations.
patch, performance 242191 No Perforce job exists for this issue. 0 12738
7 years, 35 weeks, 1 day ago 0|i02jd3:
ZooKeeper ZOOKEEPER-1514

FastLeaderElection - leader ignores the round information when joining a quorum

Bug Resolved Critical Fixed Flavio Paiva Junqueira Patrick D. Hunt Patrick D. Hunt 19/Jul/12 20:52   03/Aug/12 06:55 02/Aug/12 18:29 3.3.4 3.4.4, 3.5.0 quorum   0 6   In the following case we have a 3 server ensemble.

Initially all is well, zk3 is the leader.

However zk3 fails, restarts, and rejoins the quorum as the new leader (was the old leader, still the leader after re-election)

The existing two followers, zk1 and zk2 rejoin the new quorum again as followers of zk3.

zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) and restarted. However zk1 can never rejoin the quorum (even after an hour). During this time zk2 and zk3 are serving properly.

Later all three servers are later restarted and properly form a functional quourm.


Here are some interesting log snippets. Nothing else of interest was seen in the logs during this time:

zk3. This is where it becomes the leader after failing initially (as the leader). Notice the "round" is ahead of zk1 and zk2:

{noformat}
2012-07-18 17:19:35,423 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id = 3, Proposed zxid = 77309411648
2012-07-18 17:19:35,423 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 73014444480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 73014444480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-18 17:19:35,424 - INFO [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - LEADING
{noformat}

zk1 which won't come back. Notice that zk3 is reporting the round as 831, while zk2 thinks that the round is 832:

{noformat}
2012-07-18 17:31:12,015 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
2012-07-18 17:31:12,016 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 73014444480 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state)
2012-07-18 17:31:12,017 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
2012-07-18 17:31:15,219 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 6400
{noformat}
242192 No Perforce job exists for this issue. 4 12739
7 years, 33 weeks, 6 days ago 0|i02jdb:
ZooKeeper ZOOKEEPER-1513

"Unreasonable length" exception while starting a server.

Bug Closed Major Fixed Skye Wanderman-Milne Patrick D. Hunt Patrick D. Hunt 19/Jul/12 20:38   13/Mar/14 14:16 12/Dec/12 01:52 3.3.4 3.4.6, 3.5.0 server   0 10   The server is allowing a client to set data larger than the server can then later read:

{noformat}
2012-07-18 14:28:12,555 - FATAL [main:QuorumPeer@400] - Unable to load database on disk
java.io.IOException: Unreasonable length = 1048583
at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:232)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:131)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
2012-07-18 14:28:12,555 - FATAL [main:QuorumPeerMain@87] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
Caused by: java.io.IOException: Unreasonable length = 1048583
at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100)
at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:232)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:131)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
... 3 more
{noformat}

Notice the size is 0x100007 - 7 bytes beyond.

The SetDataTxn contains the client data + a couple extra fields. On ingest the server is applying the jute.maxbuffer size to the data (expected) but not handling the fact that the data plus these extra fields may exceed the jute.maxbuffer check when reading from disk.

Workaround was simple here: set the jute.maxbuffer size a bit higher (and fix the mis-behaving client, expectation was not that the data would grow this large).
242193 No Perforce job exists for this issue. 4 12740
6 years, 2 weeks ago
Reviewed
0|i02jdj:
ZooKeeper ZOOKEEPER-1512

Reduce log level of missing ZookeeperSaslClient Security Exception

Bug Open Major Unresolved Unassigned Micah Whitacre Micah Whitacre 18/Jul/12 14:26   23/May/14 08:00       java client   0 5   When running the Java client you frequently get messages like the following:

org.apache.zookeeper.client.ZooKeeperSaslClient SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.

In cases where we don't want this configuration enabled, the logs get spammed with this message. It's scope should lowered to debug/trace to prevent flooding logs.
242194 No Perforce job exists for this issue. 0 12741
5 years, 43 weeks, 6 days ago 0|i02jdr:
ZooKeeper ZOOKEEPER-1511

Symbolic nodes

Wish Open Major Unresolved Unassigned Sheetal Parade Sheetal Parade 17/Jul/12 14:33   17/Jul/12 14:33           0 1   Zookeeper currently allows two type of nodes: EPHEMERAL and PERSISTENT. If the node or node data needs to be referenced from other nodes, entire node needs to be copied to new location. There is no relation between the original and copied nodes. Symbolic nodes like in unix directory structure would keep the nodes and node data in sync.

Use case:
While implementing managed clusters for micro shards strategy, client can register by creating Ephemeral nodes. The master process can then create new symbolic nodes along with other clients nodes for next set of processes to watch. If the client goes down, the ephemeral node cleans itself along with symbolic node. There could be different set of watchers on symbolic nodes which would then get notified.
242195 No Perforce job exists for this issue. 0 12742
7 years, 36 weeks, 2 days ago 0|i02jdz:
ZooKeeper ZOOKEEPER-1510

Should not log SASL errors for non-secure usage

Improvement Resolved Minor Fixed Todd Lipcon Todd Lipcon Todd Lipcon 16/Jul/12 15:45   25/Sep/13 12:00 01/Aug/12 17:41 3.4.3 3.4.4, 3.5.0 java client   0 5   Since SASL support was added, all connections with non-secure clients have started logging messages like:

2012-07-01 02:13:34,986 WARN org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
2012-07-01 02:13:34,986 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.

Despite the "you may ignore this" qualifier, I've seen a lot of users confused by this message. Instead, it would be better to either log at DEBUG level, or piggy back the SASL information onto the "Opening socket connection" message (eg "Opening socket connection to X:2181. Will not use SASL because no configuration was located.")
242014 No Perforce job exists for this issue. 2 12499
6 years, 26 weeks, 1 day ago
Reviewed
0|i02hvz:
ZooKeeper ZOOKEEPER-1509

Please update documentation to reflect updated FreeBSD support.

Task Resolved Major Fixed George Neville-Neil George Neville-Neil George Neville-Neil 09/Jul/12 23:26   18/Nov/15 18:27 09/Oct/13 12:50 3.4.6, 3.5.0 3.5.0     0 3   I noticed on this page: http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html

that FreeBSD was listed as being supported only as a client due to a problem with the JVM on FreeBSD.

As of Friday zookeeper is fully supported on FreeBSD, using openjdk version 7
and I have created a port for it in our ports collection:

http://www.freshports.org/devel/zookeeper/

The zookeeper port tracks the stable release at the moment and in the near future to track the current release, as the plain zookeeper port tracks the stable.

Please update your documentation to reflect this change in support.

Best,
George Neville-Neil
gnn@freebsd.org
newbie 242196 No Perforce job exists for this issue. 2 12743
6 years, 24 weeks ago Update documentation to reflect full FreeBSD support.
Reviewed
0|i02je7:
ZooKeeper ZOOKEEPER-1508

Reliable standalone mode through redundant databases

New Feature Open Major Unresolved Unassigned Bill Bridge Bill Bridge 09/Jul/12 19:12   12/Jul/12 17:54           0 3   Single server with multiple disks or two node cluster with multiple shared disks Currently ZooKeeper requires 3 servers to provide both reliability and availability. This is fine for large internet scale clusters, but there are lots of two node clusters that could benefit from ZooKeeper. There are also single server use cases where it is highly desirable to have ZooKeeper survive a disk failure, but availability is not as important.

This feature would allow the configuration of multiple destinations for logs and snapshots. A transaction is committed when a majority of the log writes complete successfully. If one log gets an error on write, then it is taken offline until an administrator brings it online or replaces it with a new destination. ZooKeeper continues to run as long as a quorum of disks can be written.

High availability can be provided with a two node cluster. When the ZooKeeper node dies, the disks are switched to the surviving node and a new ZooKeeper starts. Faster switch over can be done if there is an observer already running in the new node.
242197 No Perforce job exists for this issue. 0 12744
7 years, 37 weeks ago 0|i02jef:
ZooKeeper ZOOKEEPER-1507

review reading epoch files, improve logging

Improvement Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 06/Jul/12 13:00   06/Jul/12 13:00   3.4.3, 3.5.0   server   0 1   When reading an epoch file we should log (error level) any problems encountered.

org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(String)

At the same time let's verify the call paths are handled properly.
242198 No Perforce job exists for this issue. 0 12745
7 years, 37 weeks, 6 days ago 0|i02jen:
ZooKeeper ZOOKEEPER-1506

Re-try DNS hostname -> IP resolution if node connection fails

Improvement Resolved Blocker Fixed Robert P. Thille Mike Heffner Mike Heffner 06/Jul/12 12:23   18/Jun/18 00:05 23/Sep/15 13:19 3.4.5, 3.4.6 3.4.7, 3.5.0, 3.6.0 server   29 49   Ubuntu 11.04 64-bit In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (<= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance.

However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname->IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum.

The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately.
patch 242199 No Perforce job exists for this issue. 14 12746
4 years, 24 weeks, 3 days ago Tests pass with this patch.
This patch is for the branch-3.4 branch ONLY.
0|i02jev:
ZooKeeper ZOOKEEPER-1505

Multi-thread CommitProcessor

Improvement Resolved Major Fixed Jay Shrauner Jay Shrauner Jay Shrauner 05/Jul/12 19:30   22/Dec/12 15:42 07/Dec/12 18:38 3.4.3, 3.4.4, 3.5.0 3.5.0 server   1 9   CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints

- each session must see its requests responded to in order
- all committed transactions must be handled in zxid order, across all sessions

I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following:

- it does not matter if the read request in one session happens before or after the write request in another session

With these constraints, I propose the following threads

- 1 primary queue servicing/work dispatching thread
- 0-N assignable worker threads, where a given session is always assigned to the same worker thread

By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag.

On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation).

New classes introduced in this patch are:

WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory).
performance, scaling 239679 No Perforce job exists for this issue. 4 2534
7 years, 13 weeks, 5 days ago
Reviewed
0|i00sen:
ZooKeeper ZOOKEEPER-1504

Multi-thread NIOServerCnxn

Improvement Resolved Major Fixed Thawan Kooburat Jay Shrauner Jay Shrauner 05/Jul/12 18:53   24/Jul/17 00:36 24/Jul/17 00:36   3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.6.0 server   3 19   NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows:

- 1 acceptor thread, for accepting new connections
- 1-N selector threads
- 0-M I/O worker threads

Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads.

On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput.

This patch incorporates and supersedes the patches for

https://issues.apache.org/jira/browse/ZOOKEEPER-517
https://issues.apache.org/jira/browse/ZOOKEEPER-1444

New classes introduced in this patch are:

- ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections
- RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging "out of file descriptors" errors
- WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here).
performance 239696 No Perforce job exists for this issue. 6 2559
2 years, 34 weeks, 3 days ago There is a possibility of file descriptor leakage issue under high workload. Please upgrade to the latest version of JVM or the version that has a fix for this bug (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7118373)
Incompatible change
1 0|i00sk7:
ZooKeeper ZOOKEEPER-1503

remove redundant JAAS configuration code in SaslAuthTest and SaslAuthFailTest

Improvement Resolved Major Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 05/Jul/12 14:03   31/Aug/12 22:12 01/Aug/12 15:23   3.4.4, 3.5.0     0 3   In SaslAuthTest and SaslAuthFail test, we set the JAAS configuration twice with the same text string. This is confusing and redundant, since we need only set it once. https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1120//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html 242015 No Perforce job exists for this issue. 1 12500
7 years, 34 weeks, 1 day ago
Reviewed
0|i02hw7:
ZooKeeper ZOOKEEPER-1502

Prevent multiple zookeeper servers from using the same data directory

Improvement Resolved Major Won't Fix Rakesh Radhakrishnan Will Johnson Will Johnson 05/Jul/12 13:33   31/Mar/14 19:41 31/Mar/14 19:41 3.4.3 3.5.0 server   1 5   We recently ran into an issue where two zookeepers servers which were a part of two separate quorums were configured to use the same data directory. Interestingly, the zookeeper servers did not seem to complain and both seemed to work fine until one of them was restarted. Once that happened all sort of chaos ensued. I understand that this is a misconfiguration should zookeeper complain about this or do users need to protect themselves in some external fashion? Is a simple file lock enough or are there other things I should take into consideration if it’s up to me to handle? 242200 No Perforce job exists for this issue. 1 12747
5 years, 51 weeks, 3 days ago 0|i02jf3:
ZooKeeper ZOOKEEPER-1501

Nagios plugin always returns OK when it cannot connect to zookeeper

Bug Resolved Major Fixed Brian Sutherland Brian Sutherland Brian Sutherland 04/Jul/12 13:14   07/Sep/12 07:01 07/Sep/12 02:30 3.4.3 3.4.4, 3.5.0 contrib   0 4   Returning OK under such conditions is really not good... 242201 No Perforce job exists for this issue. 1 12748
7 years, 28 weeks, 6 days ago
Reviewed
0|i02jfb:
ZooKeeper ZOOKEEPER-1500

Nagios check always returns OK when the critical and warning values are the same

Bug Open Minor Unresolved Brian Sutherland Brian Sutherland Brian Sutherland 04/Jul/12 13:10   31/Jul/12 18:37       contrib   0 1   The plugin requires a difference between the warning and critical value for the checks to work. If the values are the same, OK is always returned.

I can't figure out how to attach a file to this ticket in JIRA, so here's a minimal inline patch that at least lets the admin know it's not working:

{noformat}
Index: src/contrib/monitoring/check_zookeeper.py
===================================================================
--- src/contrib/monitoring/check_zookeeper.py (revision 1357335)
+++ src/contrib/monitoring/check_zookeeper.py (working copy)
@@ -57,6 +57,10 @@
print >>sys.stderr, 'Invalid values for "warning" and "critical".'
return 2

+ if warning == critical:
+ print >>sys.stderr, '"warning" and "critical" cannot have the same value.'
+ return 2
+
if opts.key is None:
print >>sys.stderr, 'You should specify a key name.'
return 2
{noformat}
242202 No Perforce job exists for this issue. 0 12749
7 years, 34 weeks, 2 days ago 0|i02jfj:
ZooKeeper ZOOKEEPER-1499

clientPort config changes not backwards-compatible

Bug Resolved Blocker Fixed Alexander Shraer Camille Fournier Camille Fournier 03/Jul/12 18:33   24/Oct/13 07:08 24/Oct/13 01:21 3.5.0 3.5.0 server   0 5   With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. 242203 No Perforce job exists for this issue. 4 12750
6 years, 22 weeks ago
Reviewed
0|i02jfr:
ZooKeeper ZOOKEEPER-1498

Zab1.0 sends NEWLEADER packet twice

Bug Resolved Minor Duplicate Unassigned Camille Fournier Camille Fournier 03/Jul/12 17:25   03/Jan/13 21:29 03/Jan/13 21:29 3.4.3, 3.5.0   server   0 3   In pre-Zab1.0, we would process the NEWLEADER packet in registerWithLeader. Now we only process it in syncWithLeader, and in certain circumstances (the first follower of a new leader) it seems like we get 2 of them, which causes 2 snapshots to be taken one right after another. Not sure whether we should ignore taking the snapshot the second time, or not send two packets, or what. 242204 No Perforce job exists for this issue. 0 12751
7 years, 11 weeks, 6 days ago 0|i02jfz:
ZooKeeper ZOOKEEPER-1497

Allow server-side SASL login with JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file)

Improvement Resolved Major Fixed Matteo Bertozzi Matteo Bertozzi Matteo Bertozzi 03/Jul/12 17:08   26/Sep/12 14:17 30/Aug/12 14:30 3.4.3, 3.5.0 3.4.4, 3.5.0 server   0 5   Currently the CnxnFactory checks for "java.security.auth.login.config" to decide whether or not enable SASL.
* zookeeper/server/NIOServerCnxnFactory.java
* zookeeper/server/NettyServerCnxnFactory.java
** configure() checks for "java.security.auth.login.config"
*** If present start the new Login("Server", SaslServerCallbackHandler(conf))

But since the SaslServerCallbackHandler does the right thing just checking if getAppConfigurationEntry() is empty, we can allow SASL with JAAS configuration to be programmatically just checking weather or not a configuration entry is present instead of "java.security.auth.login.config".
(Something quite similar was done for the SaslClient in ZOOKEEPER-1373)
security 242013 No Perforce job exists for this issue. 5 12498
7 years, 30 weeks ago
Reviewed
0|i02hvr:
ZooKeeper ZOOKEEPER-1496

Ephemeral node not getting cleared even after client has exited

Bug Resolved Critical Fixed Rakesh Radhakrishnan suja s suja s 28/Jun/12 06:24   17/Sep/12 07:02 17/Sep/12 03:58 3.4.3 3.4.4, 3.5.0 server   0 9   In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited.

Zk version: 3.4.3

Ephemeral node still exists in Zookeeper:

HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date

Tue Jun 26 16:07:04 IST 2012
HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182
Connecting to xx.xx.xx.55:2182
Welcome to ZooKeeper!
JLine support is enabled
[zk: xx.xx.xx.55:2182(CONNECTING) 0]
WATCHER::

WatchedEvent state:SyncConnected type:None path:null

[zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock

haclusternn2HOSt-xx-xx-xx-102 ��
cZxid = 0x200000075
ctime = Tue Jun 26 13:10:19 IST 2012
mZxid = 0x200000075
mtime = Tue Jun 26 13:10:19 IST 2012
pZxid = 0x200000075
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x1382791d4e50004
dataLength = 42
numChildren = 0
[zk: xx.xx.xx.55:2182(CONNECTED) 1]

Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed.

HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074
2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a
2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a

HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075
2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a
2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a


Close session and create requests coming almost parallely.


Env:
Hadoop setup.
We were using Namenode HA with bookkeeper as shared storage and auto failover enabled.
NN102 was active and NN55 was standby.
FailoverController at 102 got shut down due to ZK connection error.
The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK
242205 No Perforce job exists for this issue. 5 12752
7 years, 27 weeks, 3 days ago
Reviewed
0|i02jg7:
ZooKeeper ZOOKEEPER-1495

ZK client hangs when using a function not available on the server.

Bug Closed Minor Fixed Nicolas Liochon Nicolas Liochon Nicolas Liochon 28/Jun/12 03:51   13/Mar/14 14:16 24/Jan/13 20:37 3.4.2, 3.3.5 3.4.6, 3.5.0 server   0 7   all This happens for example when using zk#multi with a 3.4 client but a 3.3 server.

The issue seems to be on the server side: the servers drops the packets with an unknown OpCode in ZooKeeperServer#submitRequest
{noformat}
public void submitRequest(Request si) {
// snip
try {
touch(si.cnxn);
boolean validpacket = Request.isValid(si.type); // ===> Check on case OpCode.*
if (validpacket) {
// snip
} else {
LOG.warn("Dropping packet at server of type " + si.type);
// if invalid packet drop the packet.
}
} catch (MissingSessionException e) {
if (LOG.isDebugEnabled()) {
LOG.debug("Dropping request: " + e.getMessage());
}
}
}
{noformat}

The solution discussed in ZOOKEEPER-1381 would be to get an exception on the client side then & close the session.
242206 No Perforce job exists for this issue. 4 12753
6 years, 2 weeks ago
Reviewed
0|i02jgf:
ZooKeeper ZOOKEEPER-1494

C client: socket leak after receive timeout in zookeeper_interest()

Bug Resolved Major Fixed Michi Mutsuzaki Michi Mutsuzaki Michi Mutsuzaki 22/Jun/12 15:56   10/Sep/12 07:01 10/Sep/12 03:04 3.4.2, 3.3.5 3.4.4, 3.5.0 c client   0 5   In zookeeper_interest(), we set zk->fd to -1 without closing it when timeout happens. Instead we should let handle_socket_error_msg() function take care of closing the socket properly.

--Michi
242207 No Perforce job exists for this issue. 3 12754
7 years, 28 weeks, 3 days ago
Reviewed
0|i02jgn:
ZooKeeper ZOOKEEPER-1493

C Client: zookeeper_process doesn't invoke completion callback if zookeeper_close has been called

Bug Resolved Major Fixed Michi Mutsuzaki Michi Mutsuzaki Michi Mutsuzaki 20/Jun/12 16:47   21/Nov/12 05:12 29/Jul/12 01:35 3.4.3, 3.3.5 3.3.6, 3.4.4, 3.5.0 c client   0 7   In ZOOKEEPER-804, we added a check in zookeeper_process() to see if zookeeper_close() has been called. This was to avoid calling assert(cptr) on a NULL pointer, as dequeue_completion() returns NULL if the sent_requests queue has been cleared by free_completion() from zookeeper_close(). However, we should still call the completion if it is not NULL. 242208 No Perforce job exists for this issue. 3 12755
7 years, 18 weeks, 1 day ago
Reviewed
0|i02jgv:
ZooKeeper ZOOKEEPER-1492

leader cannot switch to LOOKING state when lost the majority

Bug Resolved Critical Duplicate Unassigned gaoxiao gaoxiao 20/Jun/12 08:51   20/Jun/12 10:18 20/Jun/12 10:18 3.4.3   quorum   0 2 604800 604800 0% eclipse linux When a follower leave the cluster, and the cluster cannot achieve a majority, the leader should get out from Leading stat and get into Looking state, but if the there are some observers, the leader will not get away and the client cannot use the cluster.

eg:

The servers config:

server.1=z1:2888:3888
server.2=z2:2888:3888
server.3=z3:2888:3888:observer

At first, 1,2,3 are all started, it's all ok, 2 is the leader, but at this time, if 1 is stopped, 2 will not leave the Leading state, and client cannot connect to cluster.

I think the problem is:
(Leader.java method:lead)

Line 388-407
syncedSet.add(self.getId());
synchronized (learners) {
for (LearnerHandler f : learners) {
if (f.synced()) {
syncedCount++;
syncedSet.add(f.getSid());
}
f.ping();
}
}
if (!tickSkip && !self.getQuorumVerifier().containsQuorum(syncedSet)) {
//if (!tickSkip && syncedCount < self.quorumPeers.size() / 2) {
// Lost quorum, shutdown
// TODO: message is wrong unless majority quorums used
shutdown("Only " + syncedCount + " followers, need "
+ (self.getVotingView().size() / 2));
// make sure the order is the same!
// the leader goes to looking
return;
}

The code add all learners' ping to syncedSet, and I think at this place, only followers should be added to syncedSet, so the method 'containsQuorum' can figure out the majority.
0% 0% 604800 604800 242209 No Perforce job exists for this issue. 0 12756
7 years, 40 weeks, 1 day ago 0|i02jh3:
ZooKeeper ZOOKEEPER-1491

Help for create command in zkCli is misleading

Bug Resolved Major Duplicate Unassigned Keith Turner Keith Turner 19/Jun/12 14:59   13/Dec/12 17:19 13/Dec/12 17:19 3.3.3       0 2   When I type help the shell, I see the following for the create command.

{noformat}
create [-s] [-e] path data acl
{noformat}

However, the ACL is optional. So I think the usage message should look like the following.

{noformat}
create [-s] [-e] path data [acl]
{noformat}

242210 No Perforce job exists for this issue. 0 12757
7 years, 15 weeks ago 0|i02jhb:
ZooKeeper ZOOKEEPER-1490

If the configured log directory does not exist zookeeper will not start. Better to create the directory and start

Bug Resolved Minor Fixed suja s suja s suja s 19/Jun/12 00:10   30/Jun/12 07:01 30/Jun/12 02:08   3.4.4, 3.5.0 scripts   0 8   if the configured log directory does not exists zookeeper will not start. Better to create the directory and start
in zkEnv.sh we can change as follows

if [ "x${ZOO_LOG_DIR}" = "x" ]
then
ZOO_LOG_DIR="."
else
if [ ! -w "$ZOO_LOG_DIR" ] ; then
mkdir -p "$ZOO_LOG_DIR"
fi
fi


242211 No Perforce job exists for this issue. 3 12758
7 years, 38 weeks, 5 days ago
Reviewed
0|i02jhj:
ZooKeeper ZOOKEEPER-1489

Data loss after truncate on transaction log

Bug Resolved Blocker Fixed Patrick D. Hunt Christian Ziech Christian Ziech 18/Jun/12 05:09   18/Jul/12 07:01 17/Jul/12 17:29 3.4.3, 3.3.5 3.3.6, 3.4.4, 3.5.0 server   0 9   Tested on Ubuntu 12.04 and CentOS 6, should be reproducible elsewhere The truncate method on the transaction log in the class org.apache.zookeeper.server.persistence.FileTxnLog will reduce the file size to the required amount without either closing or re-positioning the logStream (which could also be dangerous since the truncate method is not synchronized against concurrent writes to the log).

This causes the next append to that log to create a small "hole" in the file which java would interpret as binary zeroes when reading it. This then causes to the FileTxnIterator.next() implementation to detect the end of the log file too early.

I'll attach a small maven project with one junit test which can be used to reproduce the issue. Due to the blackbox nature of the test it will run for roughly 50 seconds unfortunately.

Steps to reproduce:
- Start an ensemble of zookeeper servers with at least 3 participants
- Create one entry and the remove one of the servers from the ensemble temporarily (e.g. zk-2)
- Create another entry which is hence only reflected on zk-1 and zk-3
- Take zk-1 out of the ensemble without shutting it down (that is important, I did that by interrupting the network connection to that node) and clean zk-3
- Bring back zk-2 and zk-3 so that they form a quorum
- Allow zk-1 to connect again
- zk-1 will receive a TRUNC message from zk-2 since zk-1 is now a minority knowing about that second node creation event
- Create a third node
- Force zk-1 to become master somehow
- That third node will be gone
242018 No Perforce job exists for this issue. 14 12506
7 years, 36 weeks, 1 day ago
Reviewed
0|i02hxj:
ZooKeeper ZOOKEEPER-1488

Some links are not working in the Zookeeper Documentation

Bug Open Minor Unresolved Unassigned Kiran BC Kiran BC 15/Jun/12 02:13   01/Jul/15 19:53   3.4.3   documentation   0 3   There are some internal link errors in the Zookeeper documentation. The list is as follows:
docs\zookeeperAdmin.html -> tickTime and datadir
docs\zookeeperOver.html -> fg_zkComponents, fg_zkPerfReliability and fg_zkPerfRW
docs\zookeeperStarted.html -> Logging
242212 No Perforce job exists for this issue. 0 12759
7 years, 14 weeks, 2 days ago 0|i02jhr:
ZooKeeper ZOOKEEPER-1487

if log4j.properties configuration parameters is not override by system properties then zookeeper not able to create log file.

Bug Resolved Major Invalid Unassigned Surendra Singh Lilhore Surendra Singh Lilhore 15/Jun/12 01:57   18/Jun/12 10:07 18/Jun/12 10:07     server   0 2   In [ZOOKEEPER-980|https://issues.apache.org/jira/browse/ZOOKEEPER-980] for log4j.properties provide some properties that may be overridden using system properties.
For example
JVMFLAGS="-Dzookeeper.root.logger=DEBUG,CONSOLE,ROLLINGFILE -Dzookeeper.console.threshold=DEBUG" bin/zkServer.sh start
But if we not override these property using system properties then zookeeper not able to create log file means these property not taking default value.
242213 No Perforce job exists for this issue. 0 12760
7 years, 40 weeks, 3 days ago 0|i02jhz:
ZooKeeper ZOOKEEPER-1486

A couple of bugs in the tutorial code

Bug Open Minor Unresolved Unassigned Dmitri Perelman Dmitri Perelman 12/Jun/12 17:07   20/Jul/12 18:11       documentation   1 2   Hi,
There are two problems with the barrier example code in the tutorial:

1) A znode created by a process in the function enter() is created with SEQUENTIAL suffix, however, the name of a znode deleted in the function leave() doesn't have this suffix. Actually, the leave() function tries to delete a nonexistent node => a KeeperException is thrown, which is caught silently => the process terminates without waiting for the barrier.

2) It seems that the very idea of leaving the barrier by deleting ephemeral nodes is problematic. Consider the following scenario: there are two clients: C1 and C2.
- C1 enters the barrier, creates a znode /b1/C1, checks that it's alone and starts waiting for the second client to come.
- C2 enters the barrier and creates a znode /b1/C2 - the notification to C1 is sent but still not delivered.
- C2 observes that there are enough children to /b1, enters the barrier, executes its own operations and invokes leave() procedure.
- during the leave() procedure C2 removes its znode /b1/C2 and exits.
- when the notification about C2's arrival finally arrives to C1, C1 checks the children of /b1 and doesn't find C2's znode: C1 is stuck.
The solution to this data race would be to create special znodes for leaving the barrier, similarly to the way they are created for entering the barrier.

Thanks,
Dima
242214 No Perforce job exists for this issue. 0 12761
7 years, 35 weeks, 6 days ago 0|i02ji7:
ZooKeeper ZOOKEEPER-1485

client xid overflow is not handled

Bug Open Major Unresolved Martin Kuchta Michi Mutsuzaki Michi Mutsuzaki 12/Jun/12 14:32   08/Jul/16 12:29   3.4.3, 3.3.5   c client, java client   0 9   Both Java and C clients use signed 32-bit int as XIDs. XIDs are assumed to be non-negative, and zookeeper uses some negative values as special XIDs (e.g. -2 for ping, -4 for auth). However, neither Java nor C client ensures the XIDs it generates are non-negative, and the server doesn't reject negative XIDs.

Pat had some suggestions on how to fix this:

- (bin-compat) Expire the session when the client sends a negative XID.
- (bin-incompat) In addition to expiring the session, use 64-bit int for XID so that overflow will practically never happen.

--Michi
242215 No Perforce job exists for this issue. 1 12762
3 years, 36 weeks, 6 days ago 0|i02jif:
ZooKeeper ZOOKEEPER-1484

Missing znode found in the follower

Bug Resolved Critical Invalid Thawan Kooburat Thawan Kooburat Thawan Kooburat 11/Jun/12 17:36   15/Jun/12 22:04 15/Jun/12 22:04 3.4.3   server   0 0   We noticed that one of the follower fail to restart due to missing parent node

{noformat}
2012-05-29 15:44:41,037 [myid:9] - INFO [main:FileSnap@83] - Reading snapshot /var/facebook/zeus-server/data/global-ropt.0/version-2/snapshot.3d001f19c9
2012-05-29 15:44:43,300 [myid:9] - ERROR [main:FileTxnSnapLog@220] - Parent /phpunittest/1862297546 missing for /phpunittest/1862297546/dir1
2012-05-29 15:44:43,302 [myid:9] - ERROR [main:QuorumPeer@488] - Unable to load database on disk
java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /phpunittest/1862297546
{noformat}

We believed that the root cause is due to bugs in follower sync-up logic. Due to race condition, the follower may miss some proposals. The log below show that the follower see the commit message but it haven't seen this proposal before
{noformat}
2012-05-15 15:11:27,449 [myid:13] - WARN [QuorumPeer[myid=13]/0.0.0.0:2182:Learner@378] - Got zxid 0x3c00282dc9 expected 0x3c00282dca
{noformat}

I can reproduce this by keep running FollowerResyncConcurrencyTest until failure occurs. I suspected that the root caused is due to how we handle toBeApplied and outstandingProposals in the leader.

1. In-flight proposals is removed from outstandingProposal before it is added to toBeApplied. Most of the problem I seen so far seem to caused by this gap.
2. startForwarding() iterate through outstandingProposal without locking PrepRequestProcessor properly, so there is possibility of missing in-flight proposal.

242216 No Perforce job exists for this issue. 0 12763
7 years, 41 weeks, 3 days ago Trunk seems to be OK. Found that our own effort in increasing the currency on the leader cause the issue. 0|i02jin:
ZooKeeper ZOOKEEPER-1483

Fix leader election recipe documentation

Bug Resolved Major Fixed Michi Mutsuzaki Ankur Bansal Ankur Bansal 11/Jun/12 17:09   14/Dec/12 17:11 14/Sep/12 03:34 3.4.3 3.4.4, 3.5.0 documentation   0 5   The leader election recipe documentation suggest that to avoid the herd effect a client process volunteering for leadership via child znode [i] under the leader election path [/leader] must only watch the the SMALLEST znode [j] from a different client process such that [j < i].

This will NOT avoid the herd effect as many clients will end up watching the same znode[j] where j is the next-in-sequence number greater than the number of the current leader.

Specifically in Step 3 of the Election procedure here http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection

This "where j is the SMALLEST sequence number" should be changed to this
"where j is the LARGEST sequence number"
242217 No Perforce job exists for this issue. 2 12764
7 years, 27 weeks, 6 days ago 0|i02jiv:
ZooKeeper ZOOKEEPER-1482

Batch get to improve perfermance

New Feature Resolved Major Duplicate zhiyuan.dai zhiyuan.dai zhiyuan.dai 11/Jun/12 02:14   21/May/14 16:12 21/May/14 16:12 3.3.2, 3.4.3 3.5.0, 4.0.0 server   0 7   Now,Zookeeper doesn't have batch get feature,so i add this feature.
The method is getChildrenData,we can use getChildrenData fetch some znode's children's data.
242218 No Perforce job exists for this issue. 1 12765
5 years, 44 weeks, 1 day ago 0|i02jj3:
ZooKeeper ZOOKEEPER-1481

allow the C cli to run exists with a watcher

Improvement Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 08/Jun/12 19:07   31/Aug/12 07:02 30/Aug/12 16:39 3.4.3 3.4.4, 3.5.0 c client   0 3   Adds a wexists command and also improves the stdout (type string rather than just the number). Granted wexists is more for testing purposes than strictly necessary (we have exists already) but but still worthwhile to add imo. 242219 No Perforce job exists for this issue. 1 12766
7 years, 29 weeks, 6 days ago 0|i02jjb:
ZooKeeper ZOOKEEPER-1480

ClientCnxn(1161) can't get the current zk server add, so that - Session 0x for server null, unexpected error

Bug Open Major Unresolved Leader Ni Leader Ni Leader Ni 05/Jun/12 22:00   05/Feb/20 07:16   3.4.3 3.7.0, 3.5.8 java client 27/Jun/12 0 3     When zookeeper occur an unexpected error( Not SessionExpiredException, SessionTimeoutException and EndOfStreamException), ClientCnxn(1161) will log such as the formart "Session 0x for server null, unexpected error, closing socket connection and attempting reconnect ". The log at line 1161 in zookeeper-3.3.3
  We found that, zookeeper use "((SocketChannel)sockKey.channel()).socket().getRemoteSocketAddress()" to get zookeeper addr. But,Sometimes, it logs "Session 0x for server null", you know, if log null, developer can't determine the current zookeeper addr that client is connected or connecting.
  I add a method in Class SendThread:InetSocketAddress org.apache.zookeeper.ClientCnxn.SendThread.getCurrentZooKeeperAddr().

  Here:
/**
* Returns the address to which the socket is connected.
*
* @return ip address of the remote side of the connection or null if not
* connected
*/
@Override
SocketAddress getRemoteSocketAddress() {
// a lot could go wrong here, so rather than put in a bunch of code
// to check for nulls all down the chain let's do it the simple
// yet bulletproof way
.....
client, getCurrentZooKeeperAddr 242220 No Perforce job exists for this issue. 2 12767
6 years, 24 weeks, 2 days ago client,zookeeper addr,server 0|i02jjj:
ZooKeeper ZOOKEEPER-1479

C Client: zoo_add_auth() doesn't wake up the IO thread

Bug Open Major Unresolved Unassigned Michi Mutsuzaki Michi Mutsuzaki 03/Jun/12 20:48   05/Feb/20 07:15   3.4.3 3.7.0, 3.5.8 c client   0 3   It can take up to sessionTimeout / 3 for the IO thread to send out the auth packet. The {{zoo_add_auth()}} function should call {{adaptor_send_queue(zh, 0)}} after {{calling send_last_auth_info(zh)}}.

--Michi
242221 No Perforce job exists for this issue. 0 12768
7 years, 42 weeks, 2 days ago 0|i02jjr:
ZooKeeper ZOOKEEPER-1478

Small bug in QuorumTest.testFollowersStartAfterLeader( )

Bug Closed Minor Fixed Alexander Shraer Alexander Shraer Alexander Shraer 02/Jun/12 23:03   13/Mar/14 14:16 13/Dec/12 02:19 3.4.3 3.4.6, 3.5.0 tests   0 6   The following code appears in QuorumTest.testFollowersStartAfterLeader( ):

for (int i = 0; i < 30; i++) {
try {
zk.create("/test", "test".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);
break;
} catch(KeeperException.ConnectionLossException e) {
Thread.sleep(1000);
}
// test fails if we still can't connect to the quorum after 30 seconds.
Assert.fail("client could not connect to reestablished quorum: giving up after 30+ seconds.");
}

From the comment it looks like the intention was to try to reconnect 30 times and only then trigger the Assert, but that's not what this does.
After we fail to connect once and Thread.sleep is executed, Assert.fail will be executed without retrying create.
239613 No Perforce job exists for this issue. 5 2416
6 years, 2 weeks ago
Reviewed
0|i00rof:
ZooKeeper ZOOKEEPER-1477

Test failures with Java 7 on Mac OS X

Bug Resolved Major Not A Problem Unassigned Diwaker Gupta Diwaker Gupta 01/Jun/12 15:52   08/Oct/13 12:10 31/Aug/13 14:11 3.4.3   server, tests   8 20   Mac OS X Lion (10.7.4)
Java version:
java version "1.7.0_04"
Java(TM) SE Runtime Environment (build 1.7.0_04-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, including ZooKeeperTest. A common symptom was spurious {{ConnectionLossException}}:

{code}
2012-06-01 12:01:23,420 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED testDeleteRecursiveAsync
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
at org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
... (snipped)
{code}

As background, I was actually investigating some non-deterministic failures when using Netflix's Curator with Java 7 (see https://github.com/Netflix/curator/issues/79). After a while, I figured I should establish a clean ZK baseline first and realized it is actually a ZK issue, not a Curator issue.

We are trying to migrate to Java 7 but this is a blocking issue for us right now.
242222 No Perforce job exists for this issue. 1 12769 6 years, 29 weeks, 5 days ago 0|i02jjz:
ZooKeeper ZOOKEEPER-1476

ipv6 reverse dns related timeouts on OSX connecting to localhost

Bug Open Minor Unresolved Unassigned Jilles van Gurp Jilles van Gurp 01/Jun/12 09:01   03/Jul/14 18:34           2 7   We observed a weird, random issue trying to create zookeeper client connections on osx. Sometimes it would work and sometimes it would fail. Also it is randomly very slow. It turns out both issues have the same cause.

My hosts file on osx (which is an unmodified default one), lists three entries for localhost:

127.0.0.1 localhost
::1 localhost
fe80::1%lo0 localhost

We saw zookeeper trying to connect to fe80:0:0:0:0:0:0:1%1 sometimes, which is not listed (actually one in four times, it seems to round robin over the addresses).

Whenever that happens, it sometimes works and sometimes fails. In both cases it's very slow. Reason: the reverse lookup for fe80:0:0:0:0:0:0:1%1 can't be resolved using the hosts file and it falls back to actually using the dns. Sometimes it actually works but other times it fails/times out after about 5 seconds. Probably a platform specific settings with dns setup hide this problem on linux.

As a workaround, we preresolve localhost now: Inet4Address.getByName("localhost"). This always resolves to 127.0.0.1 on my machine and works fast.

This fixes the issue for us. We're not sure where the fe80:0:0:0:0:0:0:1%1 address comes from though. I don't recall having this issue with other server side software so this might be a mix of platform setup, osx specific defaults, and zookeeper behavior.

I've seen one ticket that relates to ipv6 in zookeeper that might be related: ZOOKEEPER-667. Perhaps the workaround for that ticket introduced this problem?


242223 No Perforce job exists for this issue. 0 12770
5 years, 38 weeks ago 0|i02jk7:
ZooKeeper ZOOKEEPER-1475

Messages about missing JAAS configuration should not be logged at WARN level

Improvement Open Major Unresolved Unassigned Andrew Kyle Purtell Andrew Kyle Purtell 31/May/12 15:08   10/Sep/12 00:26           0 3   Messages about unconfigured JAAS settings probably should not be logged at WARN level because it's intentional if the user is not using any SASL based security features. The user may conclude that security is not optional, or that the missing JAAS configuration is behind failures that have an unrelated cause. Perhaps INFO level instead. 242224 No Perforce job exists for this issue. 0 12771
7 years, 28 weeks, 3 days ago 0|i02jkf:
ZooKeeper ZOOKEEPER-1474

Cannot build Zookeeper with IBM Java: use of Sun MXBean classes

Bug Closed Major Fixed Paulo Ricardo Paz Vital Adalberto Medeiros Adalberto Medeiros 30/May/12 10:31   13/Mar/14 14:17 28/Nov/12 02:46 3.4.0, 3.4.3, 3.4.4, 3.4.5 3.4.6, 3.5.0 build   0 10   zookeeper.server.NIOServerCnxn and zookeeper.server.NettyServerCnxn imports com.sun.management.UnixOperatingSystemMXBean . This OperatingSystemMXBean class is not implemented by IBM or open java.

In my case, I need IBM Java so I can run zookeeper in Power ppc64 servers.
build 242225 No Perforce job exists for this issue. 8 12772
6 years, 2 weeks ago
Reviewed
0|i02jkn:
ZooKeeper ZOOKEEPER-1473

Committed proposal log retains triple the memory it needs to

Bug Open Major Unresolved Thawan Kooburat Henry Robinson Henry Robinson 29/May/12 15:58   05/Feb/20 07:16     3.7.0, 3.5.8 server   1 7   ZKDatabase.committedLog retains the past 500 transactions to enable fast catch-up. This works great, but it's using triple the memory it needs to by retaining three copies of the data part of any transaction.

* The first is in committedLog[i].request.request.hb - a heap-allocated {{ByteBuffer}}.
* The second is in committedLog[i].request.txn.data - a jute-serialised record of the transaction
* The third is in committedLog[i].packet.data - also jute-serialised, seemingly uninitialised data.

This means that a ZK-server could be using 1G of memory more than it should be in the worst case. We should use just one copy of the data, even if we really have to refer to it 3 times.
242226 No Perforce job exists for this issue. 3 12773
5 years, 51 weeks, 4 days ago 0|i02jkv:
ZooKeeper ZOOKEEPER-1472

WatchedEvent class missing from documentation

Bug Open Minor Unresolved Unassigned David Nickerson David Nickerson 25/May/12 10:34   25/May/12 10:34   3.3.5   documentation   1 2   org.apache.zookeeper.WatchedEvent is missing from the 3.3.5 documentation. documentation 242227 No Perforce job exists for this issue. 0 12774
7 years, 43 weeks, 6 days ago 0|i02jl3:
ZooKeeper ZOOKEEPER-1471

Jute generates invalid C++ code

Bug Resolved Minor Fixed Michi Mutsuzaki Michi Mutsuzaki Michi Mutsuzaki 20/May/12 21:10   30/Jun/12 07:01 30/Jun/12 02:44 3.4.3 3.4.4, 3.5.0 jute   0 4   There are 2 issues with the current jute generated C++ code.

1. Variable declaration for JRecord is incorrect. It looks something like this:
{code}
Id id;
{code}
It should be like this instead:
{code}
org::apache::zookeeper::data::Id mid;
{code}

2. The header file declares all the variables (except for JRecord ones) with "m" prefix, but the .cc file doesn't use the prefix.
242228 No Perforce job exists for this issue. 1 12775
7 years, 38 weeks, 5 days ago
Reviewed
0|i02jlb:
ZooKeeper ZOOKEEPER-1470

zkpython: close() should delete any watcher

Bug Open Minor Unresolved Unassigned Paul Giannaros Paul Giannaros 20/May/12 13:00   20/May/12 14:31   3.4.3   contrib-bindings   0 2 3600 3600 0% When calling zookeeper.close(handle), any connection watcher for the handle is not deleted. This is a source of memory leaks for applications that create and close lots of connections. Its damage can be mitigated to some degree by changing the watcher to some function that won't keep references to instances alive before calling close.

The fix is just to add a free_pywatcher(..) call in the close sequence. Alternatively you could allow set_watcher(handle, None) as a way of deleting the watcher, but it's probably best to take care of it on close too.
0% 0% 3600 3600 memory_leak, python 242229 No Perforce job exists for this issue. 0 12776
7 years, 44 weeks, 4 days ago 0|i02jlj:
ZooKeeper ZOOKEEPER-1469

Adding Cross-Realm support for secure Zookeeper client authentication

Improvement Reopened Major Unresolved Eugene Joseph Koontz Himanshu Vashishtha Himanshu Vashishtha 20/May/12 02:13   05/Feb/20 07:15   3.4.3 3.7.0, 3.5.8 documentation   0 11   There is a use case where one needs to support cross realm authentication for zookeeper cluster. One use case is HBase Replication: HBase supports replicating data to multiple slave clusters, where the later might be running in different realms. With current zookeeper security, the region server of master HBase cluster are not able to query the zookeeper quorum members of the slave cluster. This jira is about adding such Xrealm support.
242230 No Perforce job exists for this issue. 1 12777
3 years, 39 weeks, 2 days ago 0|i02jlr:
ZooKeeper ZOOKEEPER-1468

Accurately name znode count in "four-letter words"

Improvement Open Minor Unresolved Unassigned Adam Rosien Adam Rosien 18/May/12 13:38   18/May/12 13:38           0 1   The 'stat' and 'srvr' four-letter word commands refer to "Node Count" as the number of znodes, but this is an ambiguous label (cluster nodes? znodes?) I suggest renaming the label to "ZNode Count", or something similar.

This will break existing parsers of the commands' output.
242231 No Perforce job exists for this issue. 0 12778
7 years, 44 weeks, 6 days ago 0|i02jlz:
ZooKeeper ZOOKEEPER-1467

Make server principal configurable at client side.

Improvement Closed Major Fixed Sujith Simon Laxman Laxman 16/May/12 07:57   14/Feb/20 10:23 01/Oct/19 03:37 3.4.3, 3.4.4, 3.5.0 3.6.0, 3.5.7 java client   0 18 0 8400   Server principal on client side is derived using hostname.

org.apache.zookeeper.ClientCnxn.SendThread.startConnect()
{code}
try {
zooKeeperSaslClient = new ZooKeeperSaslClient("zookeeper/"+addr.getHostName());
}
{code}

This may have problems when admin wanted some customized principals like zookeeper/clusterid@HADOOP.COM where clusterid is the cluster identifier but not the host name.

IMO, server principal also should be configurable as hadoop is doing.
100% 100% 8400 0 Security, client, kerberos, pull-request-available, sasl 239707 No Perforce job exists for this issue. 2 2587
22 weeks, 3 days ago Allow system property "zookeeper.clusterName", if defined, to be used as the instance portion of zookeeper server's Kerberos principal name. Otherwise, server's hostname will be used. 0|i00sqf:
ZooKeeper ZOOKEEPER-1466

QuorumCnxManager.shutdown missing synchronization

Bug Resolved Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 15/May/12 13:21   30/Jun/12 07:01 29/Jun/12 16:13 3.4.0, 3.3.5, 3.5.0 3.3.6, 3.4.4, 3.5.0 quorum   0 4   org.apache.zookeeper.server.quorum.QuorumCnxManager.shutdown is not being synchronized even though it's accessed by multiple threads. 242232 No Perforce job exists for this issue. 1 12779
7 years, 38 weeks, 5 days ago
Reviewed
0|i02jm7:
ZooKeeper ZOOKEEPER-1465

Cluster availability following new leader election takes a long time with large datasets - is correlated to dataset size

Bug Resolved Critical Fixed Camille Fournier Alex Gvozdenovic Alex Gvozdenovic 10/May/12 10:47   17/Jul/12 20:33 05/Jul/12 12:50 3.4.3 3.4.4, 3.5.0 leaderElection   0 12   When re-electing a new leader of a cluster, it takes a long time for the cluster to become available if the dataset is large

Test Data
----------
650mb snapshot size
20k nodes of varied size
3 member cluster

On 3.4.x branch (http://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4?r=1244779)
------------------------------------------------------------------------------------------

Takes 3-4 minutes to bring up a cluster from cold
Takes 40-50 secs to recover from a leader failure
Takes 10 secs for a new follower to join the cluster

Using the 3.3.5 release on the same hardware with the same dataset
-----------------------------------------------------------------

Takes 10-20 secs to bring up a cluster from cold
Takes 10 secs to recover from a leader failure
Takes 10 secs for a new follower to join the cluster

I can see from the logs in 3.4.x that once a new leader is elected, it pushes a new snapshot to each of the followers who need to save it before they ack the leader who can then mark the cluster as available.

The kit being used is a low spec vm so the times taken are not relevant per se - more the fact that a snapshot is always sent even through there is no difference between the persisted state on each peer.
No data is being added to the cluster while the peers are being restarted.






238940 No Perforce job exists for this issue. 5 12502
7 years, 36 weeks, 2 days ago 0|i02hwn:
ZooKeeper ZOOKEEPER-1464

document that event notification is single threaded in java/c client implementations

Improvement Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 09/May/12 16:41   09/May/12 16:41       documentation   0 0   The docs don't currently mention that there's a single thread delivering watches. Callee's should be aware of this, typically means don't make blocking calls (esp on other events!) and to limit time in the routine. 238813 No Perforce job exists for this issue. 0 12780
7 years, 46 weeks, 1 day ago 0|i02jmf:
ZooKeeper ZOOKEEPER-1463

external inline function is not compatible with C99

Bug Resolved Major Duplicate Michael Hu Michael Hu Michael Hu 07/May/12 17:52   11/May/12 01:26 11/May/12 01:26 3.4.3, 3.3.5 3.4.4, 3.5.0 build   0 0 360 360 0% debian linux x64 There is a use of external inline function in zookeeper hashtable_itr.h file, which is not compatible with C99. This causes problem when compiling with other library like code coverage library.
---
hashtable_itr.h:37: error: 'cov_v_cab2c78b' is static but used in inline
function 'hashtable_iterator_key' which is not static
---

The easy fix would be put the following line in hashtable_itr.c which ignores this inline warning.
#pragma GCC diagnostic ignored "-Winline"
0% 0% 360 360 external, inline 238471 No Perforce job exists for this issue. 1 12781
7 years, 45 weeks, 6 days ago 0|i02jmn:
ZooKeeper ZOOKEEPER-1462

Read-only server does not initialize database properly

Bug Closed Critical Fixed Thawan Kooburat Thawan Kooburat Thawan Kooburat 02/May/12 21:37   13/Mar/14 14:16 02/Oct/13 18:42 3.4.3 3.4.6 server   0 5   Brief Description:
When a participant or observer get partitioned and restart as Read-only server. ZkDb doesn't get reinitialized. This causes the RO server to drop any incoming request with zxid > 0

Error message:
Refusing session request for client /xx.xx.xx.xx:39875
as it has seen zxid 0x2e00405fd9 our last zxid is 0x0 client must try another server

Steps to reproduce:
Start an RO-enabled observer connecting to an ensemble. Kill the ensemble and wait until the observer restart in RO mode. Zxid of this observer should be 0.

Description:
Before a server transition into LOOKING state, its database get closed as part of shutdown sequence. The database of leader, follower and observer get initialized as a side effect of participating in leader election protocol. (eg. observer will call registerWithLeader() and call getLastLoggedZxid() which initialize the db if not already).

However, RO server does not participate in this protocol so its DB doesn't get initialized properly
237890 No Perforce job exists for this issue. 1 12782
6 years, 2 weeks ago 0|i02jmv:
ZooKeeper ZOOKEEPER-1461

Zookeeper C client doesn't check for NULL before dereferencing in prepend_string

Improvement Resolved Major Duplicate Stephen Tyree Stephen Tyree Stephen Tyree 01/May/12 16:35   29/Jul/12 01:37 02/May/12 10:07 3.3.5   c client   0 0 0 0 0% prepend_string, called before any checks for NULL in the c client for many API functions, has this line (zookeeper 3.3.5):

if (zh->chroot == NULL)

That means that before you check for NULL, you are dereferencing the pointer. This bug does not exist in the 3.4.* branch for whatever reason, but it still remains in the 3.3.* line. A patch which fixes it would make the line as follows:

if (zh == NULL || zh->chroot == NULL)

I would do that for you, but I don't know how to patch the 3.3.5 branch.
0% 0% 0 0 237704 No Perforce job exists for this issue. 1 12783
7 years, 47 weeks, 1 day ago 0|i02jn3:
ZooKeeper ZOOKEEPER-1460

IPv6 literal address not supported for quorum members

Bug Closed Major Fixed Joseph Walton Chris Dolan Chris Dolan 30/Apr/12 15:49   21/Jul/16 16:18 23/Jun/16 16:21 3.4.3 3.5.2, 3.6.0 quorum   5 19   Via code inspection, I see that the "server.nnn" configuration key does not support literal IPv6 addresses because the property value is split on ":". In v3.4.3, the problem is in QuorumPeerConfig:

{noformat}
String parts[] = value.split(":");
InetSocketAddress addr = new InetSocketAddress(parts[0],
Integer.parseInt(parts[1]));
{noformat}

In the current trunk (http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java?view=markup) this code has been refactored into QuorumPeer.QuorumServer, but the bug remains:

{noformat}
String serverClientParts[] = addressStr.split(";");
String serverParts[] = serverClientParts[0].split(":");
addr = new InetSocketAddress(serverParts[0],
Integer.parseInt(serverParts[1]));
{noformat}

This bug probably affects very few users because most will naturally use a hostname rather than a literal IP address. But given that IPv6 addresses are supported for clients via ZOOKEEPER-667 it seems that server support should be fixed too.
237568 No Perforce job exists for this issue. 6 12784
3 years, 39 weeks ago IPv6 addresses are now properly parsed in the config
Reviewed
0|i02jnb:
ZooKeeper ZOOKEEPER-1459

ZOOKEEPER-1833 Standalone ZooKeeperServer is not closing the transaction log files on shutdown

Sub-task Closed Major Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 30/Apr/12 04:11   19/Dec/19 12:30 07/Dec/13 05:19 3.4.0 3.4.6, 3.5.0 server   0 9   When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing.
ZooKeeperServer.java
{noformat}
if (zkDb != null) {
zkDb.clear();
}
{noformat}

Suggestion to close the zkDb as follows, this inturn will take care transaction logs:
{noformat}
if (zkDb != null) {
zkDb.clear();
try {
zkDb.close();
} catch (IOException ie) {
LOG.warn("Error closing logs ", ie);
}
}
{noformat}
237452 No Perforce job exists for this issue. 10 12785
5 years, 44 weeks, 2 days ago
Incompatible change
0|i02jnj:
ZooKeeper ZOOKEEPER-1458

Parent's cversion doesn't match the sequence number that get assigned to a child node with the SEQUENTIAL flag on.

Bug Resolved Major Not A Problem Patrick D. Hunt Andrey Kornev Andrey Kornev 29/Apr/12 21:00   30/Apr/12 19:10 30/Apr/12 19:10 3.4.3   server   0 0   All If I have a child delete op interleaving two child create ops, the second child create will nevertheless have the path suffix incremented only by 1 rather than by 2. Is this expected? The 3.3.5 version takes into account the delete and increments the sequence by 2.

PrepRequestProcessor uses the parent's cversion to generate the child's sequence suffix. However it appears that this particular cversion only counts "create" operations and it doesn't take into account the deletes. Strangely enough, the parent stats returned by getData() show the correct cversion with all the creates and deletes accounted for.

It looks like the first cversion comes from the ChangeRecord for the parent node stuck in ZooKeeperServer.outstandingChangesForPath map. And the second one (returned by getData(), that is) comes from the DataTree.

Here's a simple example that reproduces the situation.

zk.create("/parent", null, OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
Stat stat = new Stat();

zk.getData("/parent", false, stat);
stat.getCVersion(); // returns 0 -- expected;

String actualPath = zk.create("/parent/child", null, OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
// actualPath is "/parent/child0000000000" -- expected.

zk.getData("/parent", false, stat);
stat.getCVersion(); // returns 1 -- expected;

zk.getData(actualPath, false, stat);
zk.delete(actualPath,stat.getVersion()); // delete the child node

zk.getData("/parent", false, stat);
stat.getCVersion(); // returns 2;

// create another child
actualPath = zk.create("/parent/child", null, OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
// returned "/parent/child0000000001" but expected "/parent/child0000000002"

zk.getData("/parent", false, stat);
stat.getCVersion(); // returns 3;
237430 No Perforce job exists for this issue. 0 12786
7 years, 47 weeks, 3 days ago 0|i02jnr:
ZooKeeper ZOOKEEPER-1457

Ephemeral node deleted for unexpired sessions

Bug Open Major Unresolved Unassigned Neha Narkhede Neha Narkhede 26/Apr/12 15:25   30/Apr/12 15:00   3.3.4       0 5   This week, we saw a potential bug with zookeeper 3.3.4. In an attempt to adding a separate disk for zookeeper transaction logs, our SysOps team threw new disks at all the zookeeper servers in our production cluster at around the same time. Right after this, we saw degraded performance on our zookeeper cluster. And yes, I agree that this degraded behavior is expected and we could've done a better job and upgraded one server at a time. Al though, the observed impact was that ephemeral nodes got deleted without session expiration on the zookeeper clients.

Let me try and describe what I've observed from the Kafka and ZK server logs - Kafka client has a session established with ZK, say Session A, that it has been using successfully. At the time of the degraded ZK performance issue, Session A expires. Kafka's ZkClient tries to establish another session with ZK. After 9 seconds, it establishes a session, say Session B and tries to use it for creating a znode. This operation fails with a NodeExists error since another session, say session C, has created that znode. This is considered OK since ZkClient retries an operation transparently if it gets disconnected and sometimes you can get NodeExists. But then later, session C expires and hence the ephemeral node is deleted from ZK. This leads to unexpected errors in Kafka since its session, Session B, is still valid and hence it expects the znode to be there. The issue is that session C was established, created the znode and expired, without the zookeeper client on Kafka ever knowing about it.

236698 No Perforce job exists for this issue. 0 32547
7 years, 47 weeks, 3 days ago 0|i05xmn:
ZooKeeper ZOOKEEPER-1456

sessions cannot specify whether they require kerberos authenticated sessions or not

Bug Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 26/Apr/12 13:17   27/Apr/12 10:42           0 4   When creating a session there is no way for a client to specify that they require a kerberos (via sasl) authenticated session. Similarly there is no way to request an unauthenticated session if kerberos has been configured at the jvm level. 236849 No Perforce job exists for this issue. 0 32548
7 years, 48 weeks ago 0|i05xmv:
ZooKeeper ZOOKEEPER-1455

there is no way to determine if a session is sasl authenticated or not

Bug Open Critical Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 26/Apr/12 13:12   25/Sep/13 05:53           3 9   The ZooKeeper interface provides no way to determine if the session is sasl authenticated or not. There is an event sent to the watcher when the sasl authentication completes, however there no way to determine if there is intent to negotiate via sasl. As a result the event cannot be used to wait to send messages until the authentication has completed. see HADOOP-8315 236850 No Perforce job exists for this issue. 0 32549
7 years, 46 weeks ago 0|i05xn3:
ZooKeeper ZOOKEEPER-1454

Document how to run autoreconf if cppunit is installed in a non-standard directory

Improvement Resolved Trivial Fixed Michi Mutsuzaki Michi Mutsuzaki Michi Mutsuzaki 25/Apr/12 17:18   30/Jun/12 07:01 30/Jun/12 02:51   3.3.6, 3.4.4, 3.5.0 c client   0 3   By default, the source distribution of cppunit is installed under /usr/local. When you run {{autoreconf -if}}, you get an error like this:

{code}
$ autoreconf -if
configure.ac:37: warning: macro `AM_PATH_CPPUNIT' not found in library
configure.ac:37: warning: macro `AM_PATH_CPPUNIT' not found in library
configure.ac:37: error: possibly undefined macro: AM_PATH_CPPUNIT
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
autoreconf: /usr/local/bin/autoconf failed with exit status: 1
{code}

This is because {{cppunit.m4}} is installed under /usr/local/share/aclocal, but aclocal only looks at {{/usr/share/aclocal-$VERSION}} and {{/usr/share/aclocal}} assuming it was configured with {{--prefix=/usr}}. There are 3 ways to specify additional paths.

1. Set {{ACLOCAL}}.

{code}
ACLOCAL="aclocal -I /usr/local/share/aclocal" autoreconf -if
{code}

2. Set {{ACLOCAL_PATH}}.

{code}
ACLOCAL_PATH=/usr/local/share/aclocal autoreconf -if
{code}

3. Set {{ACLOCAL_FLAGS}}.

{code}
ACLOCAL_FLAGS="-I /usr/local/share/aclocal" autoreconf -if
{code}

Apparently older versions of autoreconf don't respect ACLOCAL_PATH or ACLOCAL_FLAGS, so using ACLOCAL is probably the best way to fix it. I'll update src/c/README to document this.

--Michi
236696 No Perforce job exists for this issue. 1 33278
7 years, 38 weeks, 5 days ago
Reviewed
0|i06253:
ZooKeeper ZOOKEEPER-1453

corrupted logs may not be correctly identified by FileTxnIterator

Bug Open Critical Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 24/Apr/12 18:05   18/Mar/16 16:05   3.3.3   server   1 7   See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file. 236868 No Perforce job exists for this issue. 6 32550
4 years, 6 days ago 0|i05xnb:
ZooKeeper ZOOKEEPER-1452

zoo_multi() & zoo_amulti() update operations for zkpython

Improvement Patch Available Major Unresolved Aravind Narayanan Aravind Narayanan Aravind Narayanan 19/Apr/12 20:46   05/Feb/20 07:11     3.7.0, 3.5.8 contrib-bindings   1 6 1209600 1209600 0% ZooKeeper's python bindings (src/contrib/zkpython) are missing multi-update support ({{zoo_multi()}} & {{zoo_amulti()}}) that was added to the C client recently. This issue is to bridge this gap, and add support for multi-update operations to the Python bindings. 0% 0% 1209600 1209600 python 236509 No Perforce job exists for this issue. 4 2513
3 years, 39 weeks, 2 days ago Adds new `zoo_multi()` && `zoo_amulti()` functionality to the zkpython bindings for zookeeper.

Includes a new unit test. Also used the functions from a python program that uses zkpython.

All existing unit tests still pass.
python, zkpython 0|i00s9z:
ZooKeeper ZOOKEEPER-1451

C API improperly logs getaddrinfo failures on Linux when using glibc

Bug Resolved Trivial Fixed Stephen Tyree Stephen Tyree Stephen Tyree 19/Apr/12 14:04   25/Apr/12 16:29 25/Apr/12 16:29 3.4.3 3.5.0 c client   0 1   Linux when using glibc This is how the code currently logs getaddrinfo errors:

{quote}
errno = getaddrinfo_errno(rc);
LOG_ERROR(("getaddrinfo: %s\n", strerror(errno)));
{quote}

On Linux, specifically when using glibc, there is a better function for logging getaddrinfo errors called gai_strerror. An example:

{quote}
LOG_ERROR(("getaddrinfo: %s\n", gai_strerror(rc)));
{quote}

It doesn't miss a lot of cases like the errno based version does.
236460 No Perforce job exists for this issue. 1 32551
7 years, 48 weeks, 1 day ago 0|i05xnj:
ZooKeeper ZOOKEEPER-1450

Backport ZOOKEEPER-1294 fix to 3.4 and 3.3

Task Resolved Major Fixed Norman Bishop Norman Bishop Norman Bishop 19/Apr/12 13:37   02/Mar/16 20:37 22/Apr/12 15:28 3.4.3, 3.3.5 3.3.6, 3.4.4 server   0 0   The bug from ZOOKEEPER-1294 affects 3.4 and 3.3 as well, and the patch should be backported. 236459 No Perforce job exists for this issue. 3 33279
7 years, 48 weeks, 4 days ago 0|i0625b:
ZooKeeper ZOOKEEPER-1449

Ephemeral znode not deleted after session has expired on one follower (quorum is in an inconsistent state)

Bug Resolved Major Cannot Reproduce Patrick D. Hunt Daniel Lord Daniel Lord 17/Apr/12 14:26   02/Oct/13 12:16 02/Oct/13 05:58         0 2   I've been running in to this situation in our labs fairly regularly where we'll get a single follower that will be in an inconsistent state with dangling ephemeral znodes. Here is all of the information that I have right now. Please ask if there is anything else that is useful.

Here is a quick snapshot of the state of the ensemble where you can see it is out of sync across several znodes:

-bash-3.2$ echo srvr | nc il23n04sa-zk001 2181
Zookeeper version: 3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT
Latency min/avg/max: 0/7/25802
Received: 64002
Sent: 63985
Outstanding: 0
Zxid: 0x500000a41
Mode: follower
Node count: 497

-bash-3.2$ echo srvr | nc il23n04sa-zk002 2181
Zookeeper version: 3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT
Latency min/avg/max: 0/13/79032
Received: 74320
Sent: 74276
Outstanding: 0
Zxid: 0x500000a41
Mode: leader
Node count: 493

-bash-3.2$ echo srvr | nc il23n04sa-zk003 2181
Zookeeper version: 3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT
Latency min/avg/max: 0/2/25234
Received: 187310
Sent: 187320
Outstanding: 0
Zxid: 0x500000a41
Mode: follower
Node count: 493

All of the zxids match up just fine but zk001 has 4 more nodes in its node count than the other two (including the leader..). When I use a zookeeper client connect to connect directly to zk001 I can see the following znode that should no longer exist:

[zk: localhost:2181(CONNECTED) 0] stat /siri/Douroucouli/clients/il23n04sa-app004.siri.apple.com:38096
cZxid = 0x40000001a
ctime = Mon Apr 16 11:00:47 PDT 2012
mZxid = 0x40000001a
mtime = Mon Apr 16 11:00:47 PDT 2012
pZxid = 0x40000001a
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x236bc504cb50002
dataLength = 0
numChildren = 0

This node does not exist using the client to connect to either of the other two members of the ensemble.

I searched through the logs for that session id and it looks like it was established and closed cleanly. There were several leadership/quorum problems during the course of the session but it looks like it should have been shut down properly. Neither the session nor the znode show up in the "dump" on the leader but the problem znode does show up in the "dump" on zk001.

2012-04-16 11:00:47,637 - INFO [CommitProcessor:2:NIOServerCnxn@1580] - Established session 0x236bc504cb50002 with negotiated timeout 15000 for client /17.202.71.201:38971
2012-04-16 11:20:59,341 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770] - Client attempting to renew session 0x236bc504cb50002 at /17.202.71.201:50841
2012-04-16 11:20:59,342 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1580] - Established session 0x236bc504cb50002 with negotiated timeout 15000 for client /17.202.71.201:50841
2012-04-16 11:21:09,343 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to read additional data from client sessionid 0x236bc504cb50002, likely client has closed socket
2012-04-16 11:21:09,343 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /17.202.71.201:50841 which had sessionid 0x236bc504cb50002
2012-04-16 11:21:20,352 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1435] - Closed socket connection for client /17.202.71.201:38971 which had sessionid 0x236bc504cb50002
2012-04-16 11:21:22,151 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770] - Client attempting to renew session 0x236bc504cb50002 at /17.202.71.201:38166
2012-04-16 11:21:22,152 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1580] - Established session 0x236bc504cb50002 with negotiated timeout 15000 for client /17.202.71.201:38166
2012-04-16 11:27:17,902 - INFO [ProcessThread:-1:PrepRequestProcessor@387] - Processed session termination for sessionid: 0x236bc504cb50002
2012-04-16 11:27:17,904 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /17.202.71.201:38166 which had sessionid 0x236bc504cb50002

The only way I've been able to recover from this situation is to shut down the problem follower, delete its snapshots and let it resync with the leader.

I'll attach the full log4j logs, the txn logs, the snapshots and the conf files for each member of the ensemble. Please let me know what other information is useful.
236145 No Perforce job exists for this issue. 1 32552
6 years, 25 weeks, 1 day ago 0|i05xnr:
ZooKeeper ZOOKEEPER-1448

Node+Quota creation in transaction log can crash leader startup

Bug Closed Critical Fixed Flavio Paiva Junqueira Botond Hejj Botond Hejj 17/Apr/12 12:15   13/Mar/14 14:17 05/Sep/13 17:50 3.3.5 3.4.6, 3.5.0 server   0 7   Hi,

I've found a bug in zookeeper related to quota creation which can shutdown zookeeper leader on startup.

Steps to reproduce:
1. create /quota_bug
2. setquota -n 10000 /quota_bug
3. stop the whole ensemble (the previous operations should be in the transaction log)
4. start all the servers
5. the elected leader will shutdown with an exception (Missing stat node for count /zookeeper/quota/quota_bug/zookeeper_
stats)

I've debugged a bit what happening and I found the following problem:
On startup each server loads the last snapshot and replays the last transaction log. While doing this it fills up the pTrie variable of the DataTree with the path of the nodes which have quota.
After the leader is elected the leader servers loads the snapshot and last transaction log but it doesn't clean up the pTrie variable. This means it still contains the "/quota_bug" path. Now when the "create /quota_bug" is processed from the transaction log the DataTree already thinks that the quota nodes ("/zookeeper/quota/quota_bug/zookeeper_limits" and "/zookeeper/quota/quota_bug/zookeeper_stats") are created but those node creation actually comes later in the transaction log. This leads to the missing stat node exception.

I think clearing the pTrie should solve this problem.

236123 No Perforce job exists for this issue. 7 32553
6 years, 2 weeks ago 0|i05xnz:
ZooKeeper ZOOKEEPER-1447

Per-connection network throttling to improve QoS

Improvement Open Major Unresolved Unassigned Thawan Kooburat Thawan Kooburat 16/Apr/12 19:56   16/Apr/12 19:56   3.4.3   server   1 0   Some clients maybe a heavy bandwidth user. It is possible for the total network traffic to reach NIC capacity and service quality start to degrade. We don't want these clients to affect the QoS of other clients sharing the same server.

In this improvement, we are going to add per-connection throttling mechanism which will slow down the network activity of clients with high bandwidth usage. We will add configurable parameter to limit maximum bandwidth which will be used to serve client requests. All client get equal amount of bandwidth when system approach its network capacity limit. When the system in under-utilize, throttling has no effect.
236011 No Perforce job exists for this issue. 0 41963
7 years, 49 weeks, 3 days ago 0|i07jq7:
ZooKeeper ZOOKEEPER-1446

C API makes it difficult to implement a timed wait_until_connected method correctly

Bug Open Minor Unresolved Unassigned Stephen Tyree Stephen Tyree 12/Apr/12 15:59   30/May/12 16:30   3.4.3, 3.3.5   c client   0 1   When using the C API, one might feel inclined to create a zookeeper_wait_until_connected method which waits for some amount for a connected state event to occur. The code might look like the following (didn't actually compile this):

//------
static pthread_mutex_t kConnectedMutex = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t kConnectedCondvar = PTHREAD_COND_INITIALIZER;

int zookeeper_wait_until_connected(zhandle_t* zk, const struct timespec* timeout)
{
struct timespec abstime;
clock_gettime(TIMER_ABSTIME, &abstime);
abstime->tv_sec += timeout->tv_sec;
abstime->tv_nsec += timeout->tv_nsec;

pthread_mutex_lock(&kConnectedMutex);
if (zoo_state(zk) == ZOO_CONNECTED_STATE) {
return 1;
}

pthread_cond_timedwait(&kConnectedCondvar, &kConnectedMutex, &abstime);
int state = zoo_state(zk);
return (state == ZOO_CONNECTED_STATE);
}

void zookeeper_session_callback(zhandle_t* zh, int type, int state, const char* path, void* arg)
{
pthread_mutex_lock(&kConnectedMutex);
if (type == ZOO_SESSION_EVENT && state == ZOO_CONNECTED_STATE) {
pthread_cond_broadcast(&kConnectedCondvar);
}
}
//-----

That would work fine (assuming I didn't screw anything up), except that pthread_cond_timedwait can spuriously wakeup, making you not actually wait the desired timeout. The solution to this is to loop until the condition is met, which might look like the following:

//---
int state = zoo_state(zk);
int result = 0;
while ((state == ZOO_CONNECTING_STATE || state == ZOO_ASSOCIATING_STATE) && result != ETIMEDOUT) {
result = pthread_cond_timedwait(&kConnectedCondvar, &kConnectedMutex, &abstime);
state = zoo_state(zk);
}
//---

That would work fine, except the state might be valid and connecting, yet not ZOO_CONNECTING_STATE or ZOO_ASSOCIATING_STATE, it might be 0 or, as implemented recently courtesy of zookeeper-1108, 999. Checking for those states causes your code to rely upon an implementation detail of zookeeper, a problem highlighted by that implementation detail changing recently. Is there any way this behavior can become documented (via a ZOO_INITIALIZED_STATE or something like that), or is there any way this behavior can be supported by the library itself?
235595 No Perforce job exists for this issue. 1 32554
7 years, 43 weeks, 1 day ago 0|i05xo7:
ZooKeeper ZOOKEEPER-1445

Add support for binary data for zktreeutil

Improvement Open Major Unresolved Thawan Kooburat Thawan Kooburat Thawan Kooburat 10/Apr/12 13:20   05/Feb/20 07:16   3.4.3 3.7.0, 3.5.8 contrib   0 2   zktreeutil does not support binary data. The program will fail to import/export znode's data which are in binary format. We are going to use OpenSSL library to perform Base64 encoding so that we can store it XML format. OpenSSL seems to be the only widely available library which as support for Base64 encoding and decoding. Libxml2 only have encoding support. 235273 No Perforce job exists for this issue. 2 2512
5 years, 51 weeks, 3 days ago 0|i00s9r:
ZooKeeper ZOOKEEPER-1444

Idle session-less connections never time out

Bug Resolved Critical Duplicate Jay Shrauner Jay Shrauner Jay Shrauner 09/Apr/12 21:08   27/Jul/12 01:17 27/Jul/12 01:17 3.3.2, 3.4.3, 3.5.0   server   0 2   A socket connection to the server on which a session is not created will never time out. A misbehaving client that opens and leaks connections without creating sessions will hold open file descriptors on the server.

The existing timeout code is implemented at the session level, but the servers also should track and expire connections at the connection level. Proposed solution is to pull the timeout data structure handling code (hashmap of expiry time to sets of objects, simple monotonically incrementing nextExpirationTime) from SessionTrackerImpl into its own class in order to share it with connection level timeouts to be implemented in NIOServerCnxnFactory. Connections can be assigned a small initial timeout (proposing something small, like 3s) until a session is created, at which point the ServerCnxn session timeout can be used instead.
235167 No Perforce job exists for this issue. 2 32555
7 years, 38 weeks ago Expire idle connections. 0|i05xof:
ZooKeeper ZOOKEEPER-1443

API docs for trunk returns 404

Bug Resolved Major Duplicate Patrick D. Hunt Michi Mutsuzaki Michi Mutsuzaki 04/Apr/12 19:05   09/Oct/13 02:46 09/Oct/13 02:46     documentation   0 0   The "API Docs" link is broken in trunk.

http://zookeeper.apache.org/doc/trunk/
http://zookeeper.apache.org/doc/trunk/api/index.html
234575 No Perforce job exists for this issue. 0 32556
6 years, 24 weeks, 1 day ago 0|i05xon:
ZooKeeper ZOOKEEPER-1442

Uncaught exception handler should exit on a java.lang.Error

Bug Open Minor Unresolved Jeremy Stribling Jeremy Stribling Jeremy Stribling 04/Apr/12 13:09   29/Jul/17 10:34   3.4.3, 3.3.5   java client, server   0 3   The uncaught exception handler registered in NIOServerCnxnFactory and ClientCnxn simply logs exceptions and lets the rest of ZooKeeper go on its merry way. However, errors such as OutOfMemoryErrors should really crash the program, as they represent unrecoverable errors. If the exception that gets to the uncaught exception handler is an instanceof a java.lang.Error, ZK should exit with an error code (in addition to logging the error). 234532 No Perforce job exists for this issue. 3 32557
2 years, 33 weeks, 5 days ago 0|i05xov:
ZooKeeper ZOOKEEPER-1441

Some test cases are failing because Port bind issue.

Test Closed Major Fixed Andor Molnar kavita sharma kavita sharma 03/Apr/12 08:15   20/May/19 13:50 23/Nov/18 05:52   3.6.0, 3.5.5 server, tests   0 4 0 13800   very frequently testcases are failing because of :

java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:111)
at org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:112)
at org.apache.zookeeper.server.quorum.QuorumPeer.<init>(QuorumPeer.java:514)
at org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:156)
at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:103)
at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:67)

may be because of Port Assignment so please give me some suggestions if someone is also facing same problem.
100% 100% 13800 0 flaky, flaky-test, pull-request-available 234308 No Perforce job exists for this issue. 0 32558
1 year, 16 weeks, 6 days ago
Reviewed
0|i05xp3:
ZooKeeper ZOOKEEPER-1440

Spurious log error messages when QuorumCnxManager is shutting down

Bug Resolved Minor Fixed Jordan Zimmerman Jordan Zimmerman Jordan Zimmerman 01/Apr/12 16:28   12/Mar/14 19:32 12/Mar/14 17:45 3.4.3 3.5.0 quorum, server   0 5   When shutting down the QuroumPeer, ZK server logs unnecessary errors. See QuorumCnxManager.Listener.run() - ss.accept() will throw an exception when it is closed. The catch (IOException e) will log errors. It should first check the shutdown field to see if the Listener is being shutdown. If it is, the exception is correct and no errors should be logged. 234094 No Perforce job exists for this issue. 2 32559
6 years, 2 weeks, 1 day ago 0|i05xpb:
ZooKeeper ZOOKEEPER-1439

c sdk: core in log_env for lack of checking the output argument *pwp* of getpwuid_r

Bug Resolved Major Fixed Yubing Yin Yubing Yin Yubing Yin 01/Apr/12 04:15   27/Apr/12 07:00 26/Apr/12 06:59 3.4.3, 3.3.5 3.5.0 c client   1 1 3600 3600 0% linux Man of getpwuid_r "return a pointer to a passwd structure, or NULL if the matching entry is not found or an error occurs",
"The getpwnam_r() and getpwuid_r() functions return zero on success.", it means entry may not be found when getpwuid_r success.

In log_env of zookeeper.c in c sdk:
{{if (!getpwuid_r(uid, &pw, buf, sizeof(buf), &pwp)) {}}
{{LOG_INFO(("Client environment:user.home=%s", pw.pw_dir));}}
{{}}}
pwp is not checked to ensure entry is found, pw.pw_dir is not initialized in this case, core happens in LOG_INFO.
0% 0% 3600 3600 zookeeper 234062 No Perforce job exists for this issue. 1 32560
7 years, 47 weeks, 6 days ago
Reviewed
sdk 0|i05xpj:
ZooKeeper ZOOKEEPER-1438

JMX MBeans for client connections can be orphaned

Bug Open Minor Unresolved Unassigned Todd Lipcon Todd Lipcon 30/Mar/12 00:49   30/Mar/12 00:49   3.4.2   jmx   0 2   I have a functional test that extends from ClientBase, which I'm using to stress test a piece of software that uses ZK underneath. In this test, I want to simulate disconnection events, so I fire up a thread which calls "serverFactory.closeAll()" every 50ms. The clients themselves churn through a lot of sessions as part of the test. When the test completes, the ClientBase teardown method fails, since it sees one or two MBeans "left over" from earlier elapsed sessions. 233878 No Perforce job exists for this issue. 0 32561
7 years, 51 weeks, 6 days ago 0|i05xpr:
ZooKeeper ZOOKEEPER-1437

Client uses session before SASL authentication complete

Bug Resolved Major Fixed Eugene Joseph Koontz Thomas Weise Thomas Weise 28/Mar/12 22:09   18/Feb/16 07:31 09/Sep/12 14:24 3.4.3 3.4.4, 3.5.0 java client   0 19   Found issue in the context of hbase region server startup, but can be reproduced w/ zkCli alone.

getData may occur prior to SaslAuthenticated and fail with NoAuth. This is not expected behavior when the client is configured to use SASL.
233695 No Perforce job exists for this issue. 18 32562
4 years, 5 weeks ago
Reviewed
0|i05xpz:
ZooKeeper ZOOKEEPER-1436

Add ZOO_TIMED_OUT_STATE sesion event to notify client about timeout during reconnection

Improvement Open Major Unresolved Thawan Kooburat Thawan Kooburat Thawan Kooburat 28/Mar/12 21:07   31/May/12 21:41   3.4.3   c client   1 3   The zookeeper c client knows how long its session will last, and periodically pings in order to keep that session alive. However, if it loses connection, it hops from ensemble member to ensemble member trying to reform the session - even after the session timeout expires.

This patch at a new session event (ZOO_TIMED_OUT_STATE) that notifies the user that the session timeout has passed, and we have been unable to reconnect. The event is one-shot per disconnection and get generated from the C-client library itself. The server has no knowledge of this event.

Example use cases:
1. Client can try to reconnect to a different set of observers if it unable to connect to the original set of observers.

2. Client can quickly stop acting as an active server, since other server may already taken over the active role while it is trying to reconnect.
patch 233691 No Perforce job exists for this issue. 2 41964
7 years, 42 weeks, 6 days ago 0|i07jqf:
ZooKeeper ZOOKEEPER-1435

cap space usage of default log4j rolling policy

Improvement Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 28/Mar/12 12:33   29/Mar/12 21:08 29/Mar/12 21:08 3.4.3, 3.3.5, 3.5.0 3.5.0 scripts   0 0   Our current log4j log rolling policy (for ROLLINGFILE) doesn't cap the max logging space used. This can be a problem in production systems. See similar improvements recently made in hadoop: HADOOP-8149

For ROLLINGFILE only, I believe we should change the default threshold to INFO and cap the max space to something reasonable, say 5g (max file size of 256mb, max file count of 20). These will be the defaults in log4j.properties, which you would also be able to override from the command line.
233615 No Perforce job exists for this issue. 1 12505
7 years, 51 weeks, 6 days ago
Reviewed
0|i02hxb:
ZooKeeper ZOOKEEPER-1434

zkCli crashes with NPE on stat of non-existent path

Bug Resolved Major Won't Fix Hartmut Lang Wing Yew Poon Wing Yew Poon 26/Mar/12 20:06   26/Mar/12 20:41 26/Mar/12 20:39 3.3.5   java client   0 0   In the command line client (zkCli.sh), when I do

{noformat}
stat /non-existent
{noformat}

the client crashes with

{noformat}
Exception in thread "main" java.lang.NullPointerException
at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:130)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270)
{noformat}
233306 No Perforce job exists for this issue. 1 32563
8 years, 2 days ago 0|i05xq7:
ZooKeeper ZOOKEEPER-1433

improve ZxidRolloverTest (test seems flakey)

Improvement Resolved Major Fixed Patrick D. Hunt Wing Yew Poon Wing Yew Poon 26/Mar/12 16:00   30/Mar/12 07:07 29/Mar/12 21:08 3.3.5 3.3.6, 3.4.4, 3.5.0 tests   0 0   In our jenkins job to run the ZooKeeper unit tests, org.apache.zookeeper.server.ZxidRolloverTest sometimes fails.

E.g.,

{noformat}
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /foo0
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:843)
at org.apache.zookeeper.server.ZxidRolloverTest.checkNodes(ZxidRolloverTest.java:154)
at org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenRestart(ZxidRolloverTest.java:211)
{noformat}
233273 No Perforce job exists for this issue. 2 12507
7 years, 51 weeks, 6 days ago
Reviewed
0|i02hxr:
ZooKeeper ZOOKEEPER-1432

Add javadoc and debug logging for checkACL() method in PrepRequestProcessor

Improvement Resolved Major Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 23/Mar/12 19:04   26/Apr/12 04:38 26/Apr/12 04:38 3.5.0 3.5.0 server   0 0   I have a need for more logging in the checkACL() method and thought I'd add a javadoc section for the function too, while I am there. security 233003 No Perforce job exists for this issue. 4 33280
7 years, 49 weeks, 5 days ago 0|i0625j:
ZooKeeper ZOOKEEPER-1431

zkpython: async calls leak memory

Bug Resolved Major Fixed Kapil Thangavelu johan rydberg johan rydberg 23/Mar/12 03:11   19/Jun/12 07:00 18/Jun/12 20:24 3.4.3 3.3.6, 3.4.4, 3.5.0 contrib-bindings   1 7 3600 3600 0% RHEL 6.0, self-built from 3.3.3 sources I'm seeing a memory leakage when using the "aget" method.

It leaks tuples and dicts, both containing "stats".

0% 0% 3600 3600 232854 No Perforce job exists for this issue. 4 32564
7 years, 40 weeks, 2 days ago 0|i05xqf:
ZooKeeper ZOOKEEPER-1430

add maven deploy support to the build

Task Closed Blocker Fixed Giridharan Kesavan Patrick D. Hunt Patrick D. Hunt 22/Mar/12 13:28   13/Mar/14 14:17 19/Dec/13 20:30 3.4.4, 3.5.0 3.4.6, 3.5.0 build   0 6   Infra is phasing out the current mechanism we use to deploy maven artifacts. We need to move to repository.apache.org (nexus).

In particular we need to ensure the following artifacts get published:
* zookeeper-3.x.y.jar
* zookeeper-3.x.y-sources.jar
* zookeeper-3.x.y-tests.jar
* zookeeper-3.x.y-javadoc.jar

In 3.4.4/3.4.5 we missed the tests jar which caused headaches for downstream users, such as Hadoop.
232736 No Perforce job exists for this issue. 13 41965
6 years, 2 weeks ago
Reviewed
0|i07jqn:
ZooKeeper ZOOKEEPER-1429

Response packet caching for get request

Improvement Open Minor Unresolved Unassigned Thawan Kooburat Thawan Kooburat 21/Mar/12 13:58   21/Mar/12 13:58   3.4.3   server   1 1   Motivation:
In our scalability testing, we have a large number of clients watching for data changes. All of them fetch data immediately when a watch is fired. We found that GC consumes significant amount of CPU time in this scenario. In our prototype, we added packet caching for getData() request and found that GC time reduced by 40%. GC that we used is Concurrent Mark Sweep Collector.

Design and Implementation:
Similar to our prototype, we plan to add packet caching for getData() request using LRU caching. The cache stores serializes response (data + stat) in form of ByteBuffer indexed by its pathname. The cache entry is invalidated when there is a set request that affect the data.
The data structure that we plan to use for LRU cache is CacheBuilder [http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html] because it provides many tunable features that we try and use in the future. Currently, the eviction policy will be based on memory size. Otherwise, we can implement it using LinkedHashMap if we do not want to rely on external library.
performance 232580 No Perforce job exists for this issue. 0 41966
8 years, 1 week, 1 day ago 0|i07jqv:
ZooKeeper ZOOKEEPER-1428

Create command line tool to utilize the new classes introduced in ZOOKEEPER-271

New Feature Open Major Unresolved Unassigned Ted Yu Ted Yu 20/Mar/12 13:01   08/Jul/13 13:27       java client, scripts   0 3   See discussion entitled 'ZOOKEEPER-1059 Was: Does the rolling-restart.sh script work?' on zookeerper-dev

HBase bin/rolling-restart.sh depends on zkcli returning non-zero exit code for non-existing znode.
Jonathan Hsieh found that rolling-restart.sh no longer works using zookeeper 3.4.x

From Patrick Hunt:

I think what we need is to have a tool that's intended for use both
programmatically and by humans, with more strict requirements about
input, output formatting and command handling, etc... Please see the
work Hartmut has been doing as part of 271 on trunk (3.5.0). Perhaps
we can augment these new classes to also support such a tool. However
it should instead be a true command line tool, rather than a shell.
newbie 232393 No Perforce job exists for this issue. 0 41967
6 years, 37 weeks, 3 days ago 0|i07jr3:
ZooKeeper ZOOKEEPER-1427

Writing to local files is done non-atomically

Bug Resolved Critical Fixed Patrick D. Hunt Todd Lipcon Todd Lipcon 19/Mar/12 17:25   18/Jul/12 07:01 17/Jul/12 17:22 3.4.3 3.4.4, 3.5.0 server   0 10   Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating. 232251 No Perforce job exists for this issue. 6 12503
7 years, 36 weeks, 1 day ago
Reviewed
0|i02hwv:
ZooKeeper ZOOKEEPER-1426

add version command to the zookeeper server

Improvement Resolved Major Fixed Peter Szecsi Patrick D. Hunt Patrick D. Hunt 18/Mar/12 02:34   21/Jun/19 07:22 31/May/19 16:16 3.3.5 3.6.0 scripts, server   1 6 0 11400   Add a version command to the zkServer.sh.

Hadoop does this by having a special main class: org.apache.hadoop.util.VersionInfo

We could do something similar, hook it into our current version information class (perhaps add main to that class).

Would also need to add a new "version" command to zkServer.sh that calls this.
100% 100% 11400 0 newbie, patch, pull-request-available 232104 No Perforce job exists for this issue. 4 2511
41 weeks, 6 days ago 0|i00s9j:
ZooKeeper ZOOKEEPER-1425

add version command to the zookeeper client shell

Improvement Resolved Major Fixed maoling Patrick D. Hunt Patrick D. Hunt 18/Mar/12 02:31   20/May/19 22:36 20/May/19 16:55   3.6.0 java client, scripts   0 2 0 2400   the client shell is missing a version command. Should return the version e.g. "3.5.0" 100% 100% 2400 0 pull-request-available 232103 No Perforce job exists for this issue. 1 41968
43 weeks, 2 days ago 0|i07jrb:
ZooKeeper ZOOKEEPER-1424

ZooKeeper will not allow a client to delete a tree when it should allow it

Bug Open Major Unresolved Unassigned Mihai Claudiu Toader Mihai Claudiu Toader 16/Mar/12 17:15   06/Oct/14 09:47   3.4.2   server   0 3   Linux ubuntu 11.10, Zookeeper 3.4.2, One server, Two Java clients Hi all,

While using zookeeper at midokura we hit an interesting bug in zookeeper. We did hit it sporadically
while developing some functional tests so i had to build a test case for it.

I finally created the test case and i think i narrowed down the conditions under which it happens.
So i wanted to let you know my findings since they are somewhat troublesome.

We need:
- one running zookeeper server (didn't test that with a cluster)
let's name this: server

- one running zookeeper client that will create an ephemeral node under the tree created by the next client
let's name this: the ephemeral client

- one running zookeeper client that will create a persistent tree and try to delete that tree
let's name this: the persistent client

What needs to happen is this:

step 1. - the server starts
step 2. - the persistent client connects and creates a tree
step 3. - the ephemeral client connects and adds a ephemeral node under the tree created by the persistent client
step 4. - the persistent client will try to delete the tree recursively (without including the ephemeral node in the multi op
step 5. - the ephemeral client crashes hard (the equivalent of kill -9)
step 6. - the persistent client will try to delete the tree recursively again (and fail with NoEmptyNode even if when we list the node we don't see any childrens)
- the zookeeper server needs to be restarted in order for this to work.

The step 4 is critical in the sense that if we don't have that (there is no previous error trying to remove a tree) then the nexts steps behave as we would expect them to behave (aka pass).

Also no amount of fiddling with zookeeper connection timeouts (between zookeeper and ephemeral node) will help.

If the ephemeral client is shutdown properly it seems like everything will behave properly (even with step 4).

The test code is available here:
https://github.com/mtoadermido/play

It needs an zookeepr 3.4.2 installed on the system (it uses the installed jars from the deb to spawn the zookeeper server).

The entry point is https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java

There is a lot of boiler plate since i didn't want it to be depending on stuff from midonet but the interesting part is the BlockingBug.main() method.

It will launch a zookeeper process, an external ephemeral client process, and after that act as the second client.

Available tweaks:
- the zookeeper client timeout for the ephemeral client here:
https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L56

- the step 4 here (set to true / false):
https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L69

- the shutdown of the ephemeral client (soft aka clean shutdown, hard aka kill -9):
https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L88

The result is displayed depending on the fact that the final recursive deletion succeeded or not:

We hit it !. The clear tree failed.
https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L103

"No error :("
https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L99


The conclusion is that the bug seems to be inside the zookeeper codebase and it's prone to being triggered by this
particular usage of zookeeper combined with the misfortune of having to kill the ephemeral process hard.


232004 No Perforce job exists for this issue. 1 32565
8 years, 1 week, 2 days ago 0|i05xqn:
ZooKeeper ZOOKEEPER-1423

4lw and jmx should expose the size of the datadir/datalogdir

Improvement Resolved Major Fixed Edward Ribeiro Patrick D. Hunt Patrick D. Hunt 16/Mar/12 13:03   13/Jul/15 00:04 13/Jul/15 00:04 3.5.0 3.5.1, 3.6.0 jmx   0 5   There are no metrics currently available on the size of the datadir/datalogdir. These grow w/o bound unless the cleanup script is run. It would be good to expose these metrics through jmx/4lw such that monitoring can be done on the size. Would key ppl in on whether cleanup was actually running. In particular this could be monitored/alerted on by third party systems (nagios, ganglia and the like). newbie 231983 No Perforce job exists for this issue. 8 41969
4 years, 36 weeks, 3 days ago 0|i07jrj:
ZooKeeper ZOOKEEPER-1422

Support _HOST substitution in JAAS configuration

Improvement Resolved Major Implemented Mark Fenes Thomas Weise Thomas Weise 15/Mar/12 22:03   18/Jan/18 11:54 18/Jan/18 11:54 3.4.0       2 4   At the moment a JAAS configuration file needs to be created with the Kerberos principal specified as user/host. It would be much easier for deployment automation if the host portion could be resolved at startup time, as supported in Hadoop (something like user/_HOST instead of user/hostname). A configuration alternative to global JAAS conf would be even better (via direct properties in zoo.cfg?).
231864 No Perforce job exists for this issue. 0 41970
2 years, 9 weeks ago 0|i07jrr:
ZooKeeper ZOOKEEPER-1421

Support for hierarchical ACLs

Improvement Open Major Unresolved Unassigned Thomas Weise Thomas Weise 15/Mar/12 21:47   12/Apr/15 20:07       server   3 7   Using ZK as a service, we need to restrict access to subtrees owned by different tenants. Currently there is no support for hierarchical ACLs, so it is necessary to configure the clients not only with their parent node, but also manage the ACL for each new node created in the subtree. With support for hierarchical ACLs, duplication could be avoided and the setup of the parent nodes with ACL and subsequent control of the same could be split into a separate administrative task.
231862 No Perforce job exists for this issue. 0 41971
4 years, 49 weeks, 4 days ago 0|i07jrz:
ZooKeeper ZOOKEEPER-1420

Kerberos principal to user mapping / authorization

Improvement Open Major Unresolved Unassigned Thomas Weise Thomas Weise 15/Mar/12 21:21   17/May/12 14:41   3.4.0   server   1 3   ZOOKEEPER-938 introduces server configuration options to perform a rudimentary mapping from Kerberos principal to user name:

kerberos.removeHostFromPrincipal
kerberos.removeRealmFromPrincipal

Those are sufficient to make things work for HBase and other server clusters where we cannot include the host name portion into the znode ACL, but it would be better to support a more standard approach to perform the mapping with finer grained control (i.e. do this only for specific matching principals).

Mapping in Hadoop: https://ccp.cloudera.com/display/CDHDOC/Appendix+C+-+Configuring+the+Mapping+from+Kerberos+Principals+to+Short+Names

As an alternative, a matching option at the time of ACL check that can be controlled by the process assigning ACLs to znodes could also serve the purpose. For example, principals:

user/host1@TEST.DOMAIN
user/host2@TEST.DOMAIN

would have access to a znode with ACL set as:

sasl:user/host*@TEST.DOMAIN:cdrwa

This would not require ZK server configuration, but add more runtime overhead.
231860 No Perforce job exists for this issue. 0 41972
8 years, 1 week, 6 days ago 0|i07js7:
ZooKeeper ZOOKEEPER-1419

Leader election never settles for a 5-node cluster

Bug Resolved Blocker Fixed Flavio Paiva Junqueira Jeremy Stribling Jeremy Stribling 15/Mar/12 20:07   19/Mar/12 21:19 19/Mar/12 21:19 3.4.3, 3.5.0 3.4.4, 3.5.0 leaderElection   0 1   64-bit Linux, all nodes running on the same machine (different ports) We have a situation where it seems to my untrained eye that leader election never finishes for a 5-node cluster. In this test, all nodes are ZK 3.4.3 and running on the same server (listening on different ports, of course). The nodes have server IDs of 0, 1, 2, 3, 4. The test brings up the cluster in different configurations, adding in a new node each time. We embed ZK in our application, so when we shut a node down and restart it with a new configuration, it all happens in a single JVM process. Here's our server startup code (for the case where there's more than one node in the cluster):

{code}
if (servers.size() > 1) {
_log.debug("Starting Zookeeper server in quorum server mode");

_quorum_peer = new QuorumPeer();
synchronized(_quorum_peer) {
_quorum_peer.setClientPortAddress(clientAddr);
_quorum_peer.setTxnFactory(log);
_quorum_peer.setQuorumPeers(servers);
_quorum_peer.setElectionType(_election_alg);
_quorum_peer.setMyid(_server_id);
_quorum_peer.setTickTime(_tick_time);
_quorum_peer.setInitLimit(_init_limit);
_quorum_peer.setSyncLimit(_sync_limit);
QuorumVerifier quorumVerifier =
new QuorumMaj(servers.size());
_quorum_peer.setQuorumVerifier(quorumVerifier);
_quorum_peer.setCnxnFactory(_cnxn_factory);
_quorum_peer.setZKDatabase(new ZKDatabase(log));
_quorum_peer.start();
}
} else {
_log.debug("Starting Zookeeper server in single server mode");
_zk_server = new ZooKeeperServer();
_zk_server.setTxnLogFactory(log);
_zk_server.setTickTime(_tick_time);
_cnxn_factory.startup(_zk_server);
}
{code}

And here's our shutdown code:

{code}
if (_quorum_peer != null) {
synchronized(_quorum_peer) {
_quorum_peer.shutdown();
FastLeaderElection fle =
(FastLeaderElection) _quorum_peer.getElectionAlg();
fle.shutdown();
try {
_quorum_peer.getTxnFactory().commit();
} catch (java.nio.channels.ClosedChannelException e) {
// ignore
}
}
} else {
_cnxn_factory.shutdown();
_zk_server.getTxnLogFactory().commit();
}
{code}

The test steps through the following scenarios in quick succession:

Run 1: Start a 1-node cluster, servers=[0]
Run 2: Start a 2-node cluster, servers=[0,3]
Run 3: Start a 3-node cluster, servers=[0,1,3]
Run 4: Start a 4-node cluster, servers=[0,1,2,3]
Run 5: Start a 5-node cluster, servers=[0,1,2,3,4]

It appears that run 5 never elects a leader -- the nodes just keep spewing messages like this (example from node 0):

{noformat}
2012-03-14 16:23:12,775 13308 [WorkerSender[myid=0]] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a connection already for server 2
2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Sending Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 3 (recipient), 0 (myid), 0x2 (n.peerEpoch)
2012-03-14 16:23:12,776 13309 [WorkerSender[myid=0]] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a connection already for server 3
2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Sending Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 4 (recipient), 0 (myid), 0x2 (n.peerEpoch)
2012-03-14 16:23:12,776 13309 [WorkerSender[myid=0]] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a connection already for server 4
2012-03-14 16:23:12,776 13309 [WorkerReceiver[myid=0]] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 0
2012-03-14 16:23:12,776 13309 [WorkerReceiver[myid=0]] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 4 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)
2012-03-14 16:23:12,776 13309 [WorkerReceiver[myid=0]] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 0
2012-03-14 16:23:12,776 13309 [WorkerReceiver[myid=0]] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)
2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: from=1, proposed leader=3, proposed zxid=0x0, proposed election epoch=0x1
2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 3, proposed id: 3, zxid: 0x0, proposed zxid: 0x0
2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 3, proposed id: 3, zxid: 0x0, proposed zxid: 0x0
2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 3, proposed id: 3, zxid: 0x0, proposed zxid: 0x0
2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 4, proposed id: 3, zxid: 0x0, proposed zxid: 0x0
2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 4, proposed id: 3, zxid: 0x0, proposed zxid: 0x0
{noformat}

I'm guessing this means that nodes 3 and 4 are fighting over leadership, but I don't know enough about the leader election code to debug this any further. Attaching a tarball with the logs for each run and the data directories for each node (though I don't think any data is being written to ZK during the test).
231852 No Perforce job exists for this issue. 4 12509
8 years, 1 week, 2 days ago 0|i02hy7:
ZooKeeper ZOOKEEPER-1418

Just a bug in the tutorial code on the website

Bug Open Minor Unresolved Joe Gamache Joe Gamache Joe Gamache 15/Mar/12 17:11   22/Mar/12 10:49   3.4.3   documentation   0 0   When I ran the Queue example from here: http://zookeeper.apache.org/doc/trunk/zookeeperTutorial.html
The producer created entries of the form: /app1/element0000000001...
but the consumer tried to consume of the form: /app1/element1...

adding a patch with the file attached.
231819 No Perforce job exists for this issue. 1 32566
8 years, 1 week ago 0|i05xqv:
ZooKeeper ZOOKEEPER-1417

investigate differences in client last zxid handling btw c and java clients

Bug Resolved Major Fixed Thawan Kooburat Patrick D. Hunt Patrick D. Hunt 15/Mar/12 13:05   06/Jun/13 13:21 06/Jun/13 12:54 3.4.0 3.5.0 c client, java client   0 5   In ZOOKEEPER-1412 it was identified that the c and java clients handle updating the last zxid seen a bit differently. ZOOKEEPER-1412 fixed a bug associated with this, however there are still some differences that should be investigated. 231776 No Perforce job exists for this issue. 2 32567
6 years, 42 weeks ago 0|i05xr3:
ZooKeeper ZOOKEEPER-1416

Persistent Recursive Watch

Improvement Resolved Major Fixed Jordan Zimmerman Phillip Liu Phillip Liu 14/Mar/12 18:52   11/Nov/19 21:24 08/Nov/19 11:30   3.6.0 c client, documentation, java client, server   22 30 1814400 1750800 63600 3% ZOOKEEPER-2871 h4. The Problem
A ZooKeeper Watch can be placed on a single znode and when the znode changes a Watch event is sent to the client. If there are thousands of znodes being watched, when a client (re)connect, it would have to send thousands of watch requests. At Facebook, we have this problem storing information for thousands of db shards. Consequently a naming service that consumes the db shard definition issues thousands of watch requests each time the service starts and changes client watcher.

h4. Proposed Solution
We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent means no Watch reset is necessary after a watch-fire. Recursive means the Watch applies to the node and descendant nodes. A Persistent Recursive Watch behaves as follows:

# Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS.
# CHILDREN and DATA Recursive Watches can be placed on any znode.
# EXISTS Recursive Watches can be placed on any path.
# A Recursive Watch behaves like a auto-watch registrar on the server side. Setting a Recursive Watch means to set watches on all descendant znodes.
# When a watch on a descendant fires, no subsequent event is fired until a corresponding getData(..) on the znode is called, then Recursive Watch automically apply the watch on the znode. This maintains the existing Watch semantic on an individual znode.
# A Recursive Watch overrides any watches placed on a descendant znode. Practically this means the Recursive Watch Watcher callback is the one receiving the event and event is delivered exactly once.

A goal here is to reduce the number of semantic changes. The guarantee of no intermediate watch event until data is read will be maintained. The only difference is we will automatically re-add the watch after read. At the same time we add the convience of reducing the need to add multiple watches for sibling znodes and in turn reduce the number of watch messages sent from the client to the server.

There are some implementation details that needs to be hashed out. Initial thinking is to have the Recursive Watch create per-node watches. This will cause a lot of watches to be created on the server side. Currently, each watch is stored as a single bit in a bit set relative to a session - up to 3 bits per client per znode. If there are 100m znodes with 100k clients, each watching all nodes, then this strategy will consume approximately 3.75TB of ram distributed across all Observers. Seems expensive.

Alternatively, a blacklist of paths to not send Watches regardless of Watch setting can be set each time a watch event from a Recursive Watch is fired. The memory utilization is relative to the number of outstanding reads and at worst case it's 1/3 * 3.75TB using the parameters given above.

Otherwise, a relaxation of no intermediate watch event until read guarantee is required. If the server can send watch events regardless of one has already been fired without corresponding read, then the server can simply fire watch events without tracking.
3% 3% 63600 1750800 1814400 pull-request-available 231654 No Perforce job exists for this issue. 2 41973
18 weeks, 6 days ago 0|i07jsf:
ZooKeeper ZOOKEEPER-1415

Zookeeper broadcasts host's hostname instead of IP when ecf.exported.containerfactoryargs property is not set

Bug Open Minor Unresolved Unassigned Stefano Ghio Stefano Ghio 14/Mar/12 15:06   14/Mar/12 15:06           2 1   Any OS, any Java version. The issue presents itself when using the osgi bundles org.apache.hadoop.zookeeper and org.eclipse.ecf.provider.zookeeper inside an Eclipse Equinox framework. I did not test on any other versions. Not setting the ecf.exported.containerfactoryargs property when publishing an OSGi service through Zookeeper results in the service being published under the host's hostname instead of its IP. This means that hosts not able to correctly resolve that hostname cannot connect to its ZooKeeper instance. It would be desirable to use the IP instead of the hostname when that property is purposely left blank e.g. when it is unknown where the application will be deployed. osgi 231619 No Perforce job exists for this issue. 0 32568
8 years, 2 weeks, 1 day ago osgi 0|i05xrb:
ZooKeeper ZOOKEEPER-1414

ZOOKEEPER-1833 QuorumPeerMainTest.testQuorum, testBadPackets are failing intermittently

Sub-task Closed Minor Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 14/Mar/12 10:48   13/Mar/14 14:17 09/Jan/14 14:24 3.4.3, 3.5.0 3.4.6, 3.5.0 server, tests   0 4   The QuorumPeerMainTest.testQuorum, testBadPackets testcases are failing intermittently due to the wrong ZKClient usage pattern.

Saw the following ConnectionLoss on 3.4 version:
{noformat}
KeeperErrorCode = ConnectionLoss for /foo_q1
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /foo_q1
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:657)
at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testBadPackets(QuorumPeerMainTest.java:212)
{noformat}

Since the ZooKeeper connection is happening in async way through ClientCnxn, the client should wait for the 'KeeperState.SyncConnected' event before start using. But these test cases are not waiting for the connection like:
{noformat}
ZooKeeper zk = new ZooKeeper("127.0.0.1:" + CLIENT_PORT_QP1,
ClientBase.CONNECTION_TIMEOUT, this);
zk.create("/foo_q1", "foobar1".getBytes(), Ids.OPEN_ACL_UNSAFE,
CreateMode.PERSISTENT);
{noformat}
test 231558 No Perforce job exists for this issue. 1 41974
6 years, 2 weeks ago 0|i07jsn:
ZooKeeper ZOOKEEPER-1413

Use on-disk transaction log for learner sync up

Improvement Resolved Minor Fixed Thawan Kooburat Thawan Kooburat Thawan Kooburat 13/Mar/12 17:42   14/Oct/13 11:55 01/Jul/13 13:22 3.4.3 3.5.0 server   0 11   Motivation:
The learner syncs up with leader by retrieving committed log from the leader. Currently, the leader only keeps 500 entries of recently committed log in memory. If the learner falls behind more than 500 updates, the leader will send the entire snapshot to the learner.

With the size of the snapshot for some of our Zookeeper deployments (~10G), it is prohibitively expensive to send the entire snapshot over network. Additionally, our Zookeeper may serve more than 4K updates per seconds. As a result, a network hiccups for less than a second will cause the learner to use snapshot transfer.

Design:
Instead of looking only at committed log in memory, the leader will also look at transaction log on disk. The amount of transaction log kept on disk is configurable and the current default is 100k. This will allow Zookeeper to tolerate longer temporal network failure before initiating the snapshot transfer.

Implementation:
We plan to add interface to the persistence layer will can be use to retrieve proposals from on-disk transaction log. These proposals can then be used to send to the learner using existing protocol.
performance, quorum 231470 No Perforce job exists for this issue. 9 41975
6 years, 23 weeks, 3 days ago 0|i07jsv:
ZooKeeper ZOOKEEPER-1412

java client watches inconsistently triggered on reconnect

Bug Resolved Blocker Fixed Patrick D. Hunt Botond Hejj Botond Hejj 12/Mar/12 05:13   04/Jun/12 19:33 15/Mar/12 13:06 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3 3.3.5, 3.4.4, 3.5.0 server   0 6   I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble.

After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble.
This is not always the case in the java client.

I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader.
Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered.
I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent.
231227 No Perforce job exists for this issue. 6 12510
7 years, 42 weeks, 3 days ago
Reviewed
0|i02hyf:
ZooKeeper ZOOKEEPER-1411

ZOOKEEPER-107 Consolidate membership management, distinguish between static and dynamic configuration parameters

Sub-task Resolved Major Fixed Alexander Shraer Alexander Shraer Alexander Shraer 08/Mar/12 20:02   01/May/13 22:30 02/Apr/13 02:33   3.5.0 server   0 7   Currently every server has a different static configuration file. This patch distinguishes between dynamic parameters, which are now in a separate "dynamic configuration file", and static parameters which are in the usual file. The config file points to the dynamic config file by specifying "dynamicConfigFile=...". In the first stage (this patch), all cluster membership definitions are in the dynamic config file, but in the future additional parameters may be moved to the dynamic file.

Backward compatibility makes sure that you can still use a single config file if you'd like. Only when the config is changed (once ZK-107 is in) a dynamic file is automatically created and the necessary parameters are moved to it.

This patch also moves all membership parsing and management into the QuorumVerifier classes, and removes QuorumPeer.quorumPeers.
The cluster membership is contained in QuorumPeer.quorumVerifier. QuorumVerifier was expanded and now has methods such as getAllMembers(), getVotingMembers(), getObservingMembers().

230933 No Perforce job exists for this issue. 12 33281
6 years, 51 weeks, 2 days ago
Reviewed
0|i0625r:
ZooKeeper ZOOKEEPER-1410

ZOOKEEPER-1407 Support GetData and GetChildren in Multi for C client

Sub-task Open Major Unresolved Unassigned Ted Yu Ted Yu 08/Mar/12 19:52   10/May/14 04:58       c client   0 0   This is task for C client portion of ZOOKEEPER-1407 230931 No Perforce job exists for this issue. 0 41976
8 years, 2 weeks, 6 days ago 0|i07jt3:
ZooKeeper ZOOKEEPER-1409

CLI: deprecate ls2 command

Improvement Resolved Minor Duplicate Hartmut Lang Hartmut Lang Hartmut Lang 08/Mar/12 15:26   02/Apr/14 16:09 02/Apr/14 16:09   3.5.0 java client   0 2   In the CLI mark ls2 command as deprecated.
Instead add a -s option to the ls command.
The options for ls would be:
ls [-s] [-w] path
-s stat
-w watch
230891 No Perforce job exists for this issue. 2 41977
5 years, 51 weeks, 1 day ago 0|i07jtb:
ZooKeeper ZOOKEEPER-1408

CLI: sort output of ls command

Improvement Resolved Minor Fixed Hartmut Lang Hartmut Lang Hartmut Lang 08/Mar/12 15:16   02/Apr/14 16:09 28/Mar/14 23:34   3.5.0 java client   0 3   Sort the output of the ls-command in the CLI. And remove the [] frame.

Example: change output of "ls /"
[test1, aa3, zkc1, aa2, aa1, zookeeper]
to
aa1, aa2, aa3, test1, zk1, zookeeper
230889 No Perforce job exists for this issue. 4 41978
5 years, 51 weeks, 5 days ago The output of ls-command in CLI does not contain the []-frame any more. Instead the nodes are sorted.
Incompatible change
0|i07jtj:
ZooKeeper ZOOKEEPER-1407

Support GetData and GetChildren in Multi

Improvement Resolved Major Workaround Ted Yu Ted Yu Ted Yu 07/Mar/12 11:05   11/Sep/19 16:32 14/Jun/19 20:39     java client, server   4 7   ZOOKEEPER-1410, ZOOKEEPER-3361 There is use case where GetData and GetChildren would participate in Multi.
We should add support for this case.
1% 16200 1193400 1209600 230709 No Perforce job exists for this issue. 5 41979
39 weeks, 5 days ago ZOOKEEPER-3402 should resolve this 0|i07jtr:
ZooKeeper ZOOKEEPER-1406

dpkg init scripts don't restart - missing check_priv_sep_dir

Bug Resolved Major Fixed Chris Beauchamp Chris Beauchamp Chris Beauchamp 06/Mar/12 04:14   18/Mar/12 07:00 18/Mar/12 02:50 3.4.3 3.4.4, 3.5.0 scripts   0 1   Linux Ubuntu 10.4 lucid - presumably affects debian too, but not tested here The included init.d script for dpkg creation doesn't restart.

It exits with the following error:

{quote}
\# /etc/init.d/zookeeper restart
/etc/init.d/zookeeper: 127: check_privsep_dir: not found
{quote}

Also the actual zkServer.sh line in restart has a path of .../bin/ rather than .../sbin/

230488 No Perforce job exists for this issue. 1 32569
8 years, 1 week, 4 days ago
Reviewed
0|i05xrj:
ZooKeeper ZOOKEEPER-1405

leader election recipe sample code - dispatchEvent invocations can get out of order

Bug Open Major Unresolved Unassigned Robert Varga Robert Varga 05/Mar/12 08:42   06/Mar/12 07:10   3.4.3   recipes   0 3   Since the process method is not synchronized in org.apache.zookeeper.recipes.election.LeaderElectionSupport, therefore there is a race condition where events coming in from the watch may overtake the events dispatched during the start method.

A solution to ensure that events dispatched during the start method are handled before any watch-based events is to make the process method synchronized.
230364 No Perforce job exists for this issue. 0 32570
8 years, 3 weeks, 2 days ago 0|i05xrr:
ZooKeeper ZOOKEEPER-1404

leader election pseudo code probably incorrect

Bug Resolved Major Fixed Unassigned Robert Varga Robert Varga 05/Mar/12 07:05   14/Dec/12 17:11 14/Dec/12 17:11 3.4.3   documentation   0 4   The pseudo code for leader election in the recipes.html page of 3.4.3 documentation is the following...

{quote}
Let ELECTION be a path of choice of the application. To volunteer to be a leader:

1.Create znode z with path "ELECTION/guid-n_" with both SEQUENCE and EPHEMERAL flags;

2.Let C be the children of "ELECTION", and i be the sequence number of z;

3.Watch for changes on "ELECTION/guid-n_j", where j is the {color:red}*smallest*{color} sequence number such that j < i and n_j is a znode in C;

Upon receiving a notification of znode deletion:

1.Let C be the new set of children of ELECTION;

2.If z is the smallest node in C, then execute leader procedure;

3.Otherwise, watch for changes on "ELECTION/guid-n_j", where j is the {color:red}*smallest*{color} sequence number such that j < i and n_j is a znode in C;
{quote}


I think, in both third steps *highest* should appear instead of {color:red}*smallest*{color}.
230354 No Perforce job exists for this issue. 0 32571
7 years, 14 weeks, 6 days ago 0|i05xrz:
ZooKeeper ZOOKEEPER-1403

zkCli.sh script quoting issue

Bug Resolved Minor Fixed James Page James Page James Page 02/Mar/12 06:12   18/Mar/12 07:00 18/Mar/12 03:04 3.3.4, 3.4.3 3.3.6, 3.4.4, 3.5.0 scripts   0 1   Ubuntu/Debian The zkCli.sh script included with zookeeper doesn't quote its parameters
correctly when passing them on to the java program.

This causes issues with arguments with spaces and such.
230102 No Perforce job exists for this issue. 1 32572
8 years, 1 week, 4 days ago
Reviewed
0|i05xs7:
ZooKeeper ZOOKEEPER-1402

Upload Zookeeper package to Maven Central

Improvement Resolved Minor Done Flavio Paiva Junqueira Igor Lazebny Igor Lazebny 01/Mar/12 11:32   08/Oct/15 13:13 23/Sep/15 12:41 3.3.4 3.4.7     4 9   It would be great to make Zookeeper package available in Maven Central as other Apache projects do (Camel, CXF, ActiveMQ, Karaf, etc).
That would simplify usage of this package in maven builds.
229990 No Perforce job exists for this issue. 0 41980
4 years, 24 weeks ago this is just a jute plugin (perhaps we can open up the github it sits in). It made the maven pom file slightly cleaner and is a good template

used by the next major patch that takes the previous mavenization patch and moves folders to match maven's expected structure. This takes out much of the custom config (but this patch is not yet 100% complete; still tweaking to handle all cases 100% for sure). putting both up for examination to see how people feel about moving to the maven structure (I show one way to use the modules also)
0|i07jtz:
ZooKeeper ZOOKEEPER-1401

Extract generally useful client utilities from CLI code

Improvement Open Major Unresolved Unassigned Thomas Weise Thomas Weise 24/Feb/12 13:12   17/Mar/12 15:59       java client   0 0   There are a bunch of things that would be useful/reusable from ZK Java client, such as ACL parsing. Also, it would be nice to see other utilities for dealing with path creation ("mkdir -p ...") readily available for clients rather than implementing in downstream projects. Some of this can be seen in HIVE-2712.
229263 No Perforce job exists for this issue. 0 41981
8 years, 1 week, 5 days ago 0|i07ju7:
ZooKeeper ZOOKEEPER-1400

Allow logging via callback instead of raw FILE pointer

Improvement Resolved Major Fixed Michi Mutsuzaki Marshall McMullen Marshall McMullen 23/Feb/12 17:18   21/Aug/13 07:06 21/Aug/13 05:41 3.5.0 3.5.0 c client   0 5   Linux The existing logging framework inside the C client uses a raw FILE*. Using a FILE* is very limiting and potentially dangerous. A safer alternative is to just provide a callback that the C client will call for each message. In our environment, we saw some really nasty issues with multiple threads all connecting to zookeeper via the C Client related to the use of a raw FILE*. Specifically, if the FILE * is closed and that file descriptor is reused by the kernel before the C client is notified then the C client will use it's static global logStream pointer for subsequent logging messages. That FILE* is now a loose cannon! In our environment, we saw zookeeper log messages ending up in other sockets and even in our core data path. Clearly this is dangerous. In our particular case, we'd omitted a call to zoo_set_log_stream(NULL) to notify C client that the FILE* has been closed. However, even with that bug fixed, there's still a race condition where log messages in flight may be sent before the C client is notified of the FILE closure, and the same problem can happen.

Other issues we've seen involved multiple threads, wherein one would close the FILE*, and that's a global change that affects all threads connected within that process. That's a pretty nasty limitation as well.

My proposed change is to allow setting a callback for log messages. A callback is used in preference to a raw FILE*. If no callback is set, then it will fallback to the existing FILE*. If that's not set, then it falls back to stderr as it always has.

While refactoring this code, I removed the need for the double parens in all the LOG macros as that wasn't necessary and didn't fit with my new approach.
229151 No Perforce job exists for this issue. 8 2585
6 years, 31 weeks, 1 day ago 0|i00spz:
ZooKeeper ZOOKEEPER-1399

Binary Jar in zookeeper-3.3.4 displays wrong version when run

Bug Open Minor Unresolved Unassigned Mike Lundy Mike Lundy 22/Feb/12 21:16   22/Feb/12 21:16   3.3.4   build, server   0 0   When you start up zookeeper using the jar in zookeeper-3.3.4.tar.gz, it prints a 3.3.3 version string:

server.ZooKeeperServer - Server environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT
server.ZooKeeperServer - Server environment:java.class.path=/usr/lib/zookeeper/apache-rat-tasks-0.6.jar:/usr/lib/zookeeper/commons-lang-2.4.jar:/usr/lib/zookeeper/commons-cli-1.1.jar:/usr/lib/zookeeper/log4j-1.2.15.jar:/usr/lib/zookeeper/commons-collections-3.2.jar:/usr/lib/zookeeper/apache-rat-core-0.6.jar:/usr/lib/zookeeper/jline-0.9.94.jar:/usr/lib/zookeeper/zookeeper-3.3.4.jar:/etc/zookeeper

I assume this is due to a build problem of some form. (Rebuilding the jar from the tarball fixes the version).
229029 No Perforce job exists for this issue. 0 32573
8 years, 5 weeks ago 0|i05xsf:
ZooKeeper ZOOKEEPER-1398

zkpython corrupts session passwords that contain nulls

Bug Open Critical Unresolved Mike Lundy Mike Lundy Mike Lundy 22/Feb/12 14:10   25/Sep/14 08:56   3.3.4   c client, contrib-bindings   0 3   If the session password contains a nul character (\0), it will be mutated as it is passed to python. zkpython currently uses the ParseArgs flag that stops on nul. 228972 No Perforce job exists for this issue. 1 32574
5 years, 26 weeks ago 0|i05xsn:
ZooKeeper ZOOKEEPER-1397

Remove BookKeeper documentation links

Improvement Resolved Major Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 22/Feb/12 03:20   17/Mar/12 19:11 17/Mar/12 18:16   3.5.0     0 0   BookKeeper is now a subproject and its documentation is maintained in the site of the subproject. Consequently, we should remove the links in the zookeeper documentation pages or at least point to the documentation of the subproject site. 228878 No Perforce job exists for this issue. 1 33282
8 years, 1 week, 5 days ago 0|i0625z:
ZooKeeper ZOOKEEPER-1396

Create zoo_append API

Improvement Open Minor Unresolved Unassigned Stephen Tyree Stephen Tyree 21/Feb/12 20:36   21/Feb/12 20:36       c client, java client, server   0 0   I was trying to append data to a znode from the C library and I realized the workflow for that is pretty unfortunate. Essentially you need to do the following:

- call zoo_exists to get the Stat structure which contains the data length of the znode
- Allocate that many bytes plus how many you are adding to the znode dynamically in a buffer
- call zoo_get to get the data for the znode
- append the data you are append'ing to the znode in your local buffer
- call zoo_set to set the data back into the znode

If between the zoo_set and the zoo_get the data changes, sorry! You have to start from scratch. For a case where multiple consumers are trying to append data to a znode, this can become a nuisance. If there existed a zoo_append API, the workflow would become:

- call zoo_append to append the data into the znode
- If that fails, call zoo_set to create the znode with the data

Assuming zoo_append wouldn't create the znode. This would mean fewer round trips against the server and simpler code. Even the Java library, which wouldn't need to worry about calling zoo_exists, would have one fewer round trip in the typical case.

Is this a typical workflow for people? Would anyone else find this API valuable?
228843 No Perforce job exists for this issue. 0 41982
8 years, 5 weeks, 1 day ago 0|i07juf:
ZooKeeper ZOOKEEPER-1395

node-watcher double-free redux

Bug Resolved Critical Fixed Mike Lundy Mike Lundy Mike Lundy 21/Feb/12 15:01   25/Apr/12 19:37 30/Mar/12 18:31 3.3.4 3.3.6, 3.4.4, 3.5.0 c client, contrib-bindings   0 1   This is basically the same issue as ZOOKEEPER-888 and ZOOKEEPER-740 (the latter is open as I write this, but it was superseded by the fix that went in with 888). The problem still exists after the ZOOKEEPER-888 patch, however; it's just more difficult to trigger:

1) Zookeeper notices connection loss, schedules watcher_dispatch
2) Zookeeper notices session loss, schedules watcher_dispatch
3) watcher_dispatch runs for connection loss
4) pywatcher is freed due to is_unrecoverable being true
5) watcher_dispatch runs for session loss
6) PyObject_CallObject attempts to run freed pywatcher with varying bad results

The fix is easy, the dispatcher should act on the state it is given, not the state of the world when it runs. (Patch attached). Reliably triggering the crash is tricky due to the race, but it's not theoretical.
228790 No Perforce job exists for this issue. 2 32575
7 years, 51 weeks, 5 days ago 0|i05xsv:
ZooKeeper ZOOKEEPER-1394

ClassNotFoundException on shutdown of client

Bug Resolved Minor Not A Problem wu wen Herman Meerlo Herman Meerlo 21/Feb/12 08:26   17/Feb/17 08:44 30/Oct/16 22:22 3.4.2   java client   1 5   ZOOKEEPER-2618 OS X 10.7 java version "1.6.0_29" When close() is called on the ZooKeeper instance from a ContextListener (contextDestroyed) there is no way to synchronize with the fact that the EventThread and SendThread have actually finished their work. The problem lies in the SendThread which makes a call to ZooTrace when it exits, but that class has not been loaded yet. Because the ContextListener could not synchronize with the death of the threads the classloader has already disappeared, resulting in a ClassNotFoundException.
My personal opinion is that the close() method should probably wait until the event and send thread have actually died.
228730 No Perforce job exists for this issue. 1 32576
3 years, 20 weeks, 3 days ago 0|i05xt3:
ZooKeeper ZOOKEEPER-1393

ZooKeeper client exists() javadoc incorrectly states watcher(s) will be triggered on node deletion

Bug Resolved Minor Invalid Unassigned Gary Malouf Gary Malouf 15/Feb/12 14:11   25/Feb/12 13:41 25/Feb/12 13:41 3.3.4, 3.4.2   java client   0 1 1200 1200 0% I found it very misleading that the javadoc for the exists() calls that take a boolean or a Watcher state that 'The watch will be triggered by a successful operation that creates/delete the node or sets the data on the node.'

What I've seen from descriptions of bugs (older but this is this references it http://zookeeper-user.578899.n2.nabble.com/Exists-Watch-Triggered-by-Delete-td1490893.html) and my own personal usage is that watchers set on exists() are triggered when a non-existing node is now created or an existing node is changed. They are NOT triggered when the node already exists and is deleted.

http://zookeeper.apache.org/doc/r3.4.3/api/index.html
0% 0% 1200 1200 228019 No Perforce job exists for this issue. 0 32577
8 years, 4 weeks, 5 days ago 0|i05xtb:
ZooKeeper ZOOKEEPER-1392

Should not allow to read ACL when not authorized to read node

Bug Closed Major Fixed Bruce Gao Thomas Weise Thomas Weise 12/Feb/12 20:45   02/Apr/19 06:40 06/Feb/19 09:40 3.4.2 3.6.0, 3.5.5, 3.4.14 server   0 6   Not authorized to read, yet still able to list ACL:

[zk: localhost:2181(CONNECTED) 0] getAcl /sasltest/n4
'sasl,'notme@EXAMPLE.COM
: cdrwa
[zk: localhost:2181(CONNECTED) 1] get /sasltest/n4
Exception in thread "main" org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /sasltest/n4
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:711)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
227630 No Perforce job exists for this issue. 1 32578
1 year, 6 weeks, 1 day ago 0|i05xtj:
ZooKeeper ZOOKEEPER-1391

zkCli dies on NoAuth

Bug Resolved Major Duplicate Hartmut Lang Thomas Weise Thomas Weise 12/Feb/12 20:41   26/Apr/12 04:47 26/Apr/12 04:47 3.4.2 3.5.0 java client   0 1   [zk: localhost:2181(CONNECTED) 1] create /sasltest/n4 c sasl:notme@EXAMPLE.COM:cdrwa
Created /sasltest/n4
[zk: localhost:2181(CONNECTED) 2] ls /sasltest/n4
Exception in thread "main" org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /sasltest/n4
at org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1476)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
227629 No Perforce job exists for this issue. 2 32579
7 years, 48 weeks ago 0|i05xtr:
ZooKeeper ZOOKEEPER-1390

some expensive debug code not protected by a check for debug

Improvement Resolved Major Fixed Benjamin Reed Benjamin Reed Benjamin Reed 10/Feb/12 00:12   17/Mar/12 12:07 17/Mar/12 12:07   3.4.4, 3.5.0 server   0 0   there is some expensive debug code in DataTree.processTxn() that formats transactions for debugging that are very expensive but are only used when errors happen and when debugging is turned on. 227362 No Perforce job exists for this issue. 1 33283
8 years, 1 week, 5 days ago 0|i06267:
ZooKeeper ZOOKEEPER-1389

it would be nice if start-foreground used exec $JAVA in order to get rid of the intermediate shell process

Improvement Resolved Major Fixed Roman Shaposhnik Roman Shaposhnik Roman Shaposhnik 08/Feb/12 14:42   16/Feb/12 05:55 15/Feb/12 18:03 3.4.2 3.3.5, 3.4.4, 3.5.0 scripts, server   0 0   A log of daemon management tools expect a process itself to be running as a child instead of a grand-child. It would be nice if we had an option for that in zkServer.sh 227153 No Perforce job exists for this issue. 1 12515
8 years, 6 weeks ago
Reviewed
0|i02hzj:
ZooKeeper ZOOKEEPER-1388

Client side 'PathValidation' is missing for the multi-transaction api.

Bug Closed Major Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 07/Feb/12 00:17   13/Mar/14 14:17 17/Dec/13 11:57 3.4.0 3.4.6, 3.5.0 java client   0 9   Multi ops: Op.create(path,..), Op.delete(path, ..), Op.setData(path, ..),
Op.check(path, ...) apis are not performing the client side path validation and the call will go to the server side and is throwing exception back to the client.

It would be good to provide ZooKeeper client side path validation for the multi transaction apis. Presently its getting err codes from the server, which is also not properly conveying the cause.

For example: When specified invalid znode path in Op.create, it giving the following exception. This will not be useful to know the actual cause.
{code}
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1174)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1115)
{code}
226850 No Perforce job exists for this issue. 6 32580
6 years, 2 weeks ago
Reviewed
0|i05xtz:
ZooKeeper ZOOKEEPER-1387

Wrong epoch file created

Bug Closed Minor Fixed Benjamin Reed Benjamin Busjaeger Benjamin Busjaeger 06/Feb/12 00:57   13/Mar/14 14:16 13/Dec/12 03:00 3.4.2 3.4.6, 3.5.0 quorum   0 4   It looks like line 443 in QuorumPeer [1] may need to change from:

writeLongToFile(CURRENT_EPOCH_FILENAME, acceptedEpoch);

to

writeLongToFile(ACCEPTED_EPOCH_FILENAME, acceptedEpoch);

I only noticed this reading the code, so I may be wrong and I don't know yet if/how this affects the runtime.

[1] https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L443
226662 No Perforce job exists for this issue. 2 2376
6 years, 2 weeks ago
Reviewed
0|i00rfj:
ZooKeeper ZOOKEEPER-1386

avoid flaky URL redirection in "ant javadoc" : replace "http://java.sun.com/javase/6/docs/api/" with "http://download.oracle.com/javase/6/docs/api/"

Bug Resolved Minor Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 03/Feb/12 14:14   27/Feb/12 19:23 26/Feb/12 20:01   3.3.5, 3.4.4, 3.5.0 documentation   0 1   It seems that the current javadoc.link.java value, http://java.sun.com/javase/6/docs/api/, redirects (via HTTP 301) to http://download.oracle.com/javase/6/docs/api/. This redirect does not always work apparently, causing the URL fetch to fail. This causes an additional javadoc warning:

javadoc: warning - Error fetching URL: http://java.sun.com/javase/6/docs/api/package-list

which can in turn cause Jenkins to give a -1 to an otherwise OK build (see e.g. https://issues.apache.org/jira/browse/ZOOKEEPER-1373?focusedCommentId=13199456&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13199456).

226473 No Perforce job exists for this issue. 1 12514
8 years, 4 weeks, 3 days ago 0|i02hzb:
ZooKeeper ZOOKEEPER-1385

zookeeper.apache.org/doc/trunk/ has broken pointers

Bug Open Major Unresolved Unassigned Flavio Paiva Junqueira Flavio Paiva Junqueira 01/Feb/12 12:27   09/Oct/13 02:44       documentation   0 0   API Docs gives a "Not found" message. 226136 No Perforce job exists for this issue. 0 32581
8 years, 8 weeks, 1 day ago 0|i05xu7:
ZooKeeper ZOOKEEPER-1384

test-cppunit overrides LD_LIBRARY_PATH and fails if gcc is in non-standard location

Bug Resolved Minor Fixed Jay Shrauner Jay Shrauner Jay Shrauner 31/Jan/12 19:51   19/Mar/12 07:00 19/Mar/12 02:17 3.4.2 3.4.4, 3.5.0 build, tests   0 1   Linux On Linux with gcc installed in /usr/local and the libs in /usr/local/lib64, test-core-cppunit fails because zktest-st is unable to find the right libstdc++.

build.xml is overriding the environment LD_LIBRARY_PATH instead of appending to it. This should be changed to match the treatment of PATH by appending the desired extra path.
226057 No Perforce job exists for this issue. 1 32582
8 years, 1 week, 3 days ago
Reviewed
0|i05xuf:
ZooKeeper ZOOKEEPER-1383

Create update throughput quotas and add hard quota limits

New Feature Open Major Unresolved Thawan Kooburat Jay Shrauner Jay Shrauner 31/Jan/12 19:09   16/Jun/19 22:36       server   0 3   Quotas exist for size (node count and size in bytes); it would be useful to track and support quotas on update throughput (bytes per second) as well. This can be tracked on both a node/subtree level for quota support as well as on the server level for monitoring.

In addition, the existing quotas log a warning when they are exceeded but allow the transaction to proceed (soft quotas). It would also be useful to support a corresponding set of hard quota limits that fail the transaction.
226050 No Perforce job exists for this issue. 4 2588
7 years, 5 weeks, 1 day ago Adds support for throughput quotas (soft and hard) and hard node count and hard size quotas. Parses quota nodes from older versions of the server and preserves behavior of existing quotas (soft node count and soft size). quotas 0|i00sqn:
ZooKeeper ZOOKEEPER-1382

Zookeeper server holds onto dead/expired session ids in the watch data structures

Bug Closed Critical Fixed Germán Blanco Neha Narkhede Neha Narkhede 30/Jan/12 20:06   14/Oct/16 01:47 11/Dec/13 14:18 3.4.5 3.4.6, 3.5.0 server   2 18   I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f -

There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster

From the application log -

application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000
application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect
application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection

On the leader zk, 225 -

zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded
zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f

On the server, the client was initially connected to, 223 -

zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020
zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f

Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event -

2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367
2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired
2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f

wchp output from 226, taken on 01/30 -

nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l
3

wchp output from 223, taken on 01/30 -

nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l
0

cons output from 223 and 226, taken on 01/30 -

nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l
0

nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l
0

So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id.


In NIOServerCnxn, I saw that after suspecting that a session is expired, a server removes the cnxn and its watches from its internal data structures. But before that it allows more requests to be processed even if the session is expired -

// Now that the session is ready we can start receiving packets
synchronized (this.factory) {
sk.selector().wakeup();
enableRecv();
}
} catch (Exception e) {
LOG.warn("Exception while establishing session, closing", e);
close();
}

I wonder if the client somehow sneaked in the set watches, right after the server removed the connection through removeCnxn() API ?
225890 No Perforce job exists for this issue. 10 32583
3 years, 22 weeks, 6 days ago 0|i05xun:
ZooKeeper ZOOKEEPER-1381

Add a method to get the zookeeper server version from the client

Improvement Open Minor Unresolved Unassigned Nicolas Liochon Nicolas Liochon 30/Jan/12 16:31   28/Jun/12 04:10   3.4.2   c client, documentation, java client, server   0 3   all Zookeeper client API is designed to be server version agnostic as much as possible, so we can have new clients with old servers (or the opposite). But there is today no simple way for a client to know what's the server version. This would be very useful in order to;
- check the compatibility (ex: 'multi' implementation available since 3.4 while 3.4 clients API supports 3.3 servers as well)
- have different implementation depending on the server functionalities

A workaround (proposed by Mahadev Konar) is do "echo stat | nc hostname clientport" and parse the output to get the version. The output is, for example:
-----------------------
Zookeeper version: 3.4.2--1, built on 01/30/2012 17:43 GMT
Clients:
/127.0.0.1:54951[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Outstanding: 0
Zxid: 0x500000001
Mode: follower
Node count: 7
--------------------
newbie 225852 No Perforce job exists for this issue. 1 41983
7 years, 39 weeks ago 0|i07jun:
ZooKeeper ZOOKEEPER-1380

zkperl: _zk_release_watch doesn't remove items properly from the watch list

Bug Resolved Major Fixed Botond Hejj Botond Hejj Botond Hejj 30/Jan/12 12:14   07/Sep/12 07:01 07/Sep/12 02:15 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2 3.4.4, 3.5.0 contrib-bindings   0 3   The doubly linked list of watches is not updated properly if a watch is taken out from the middle of the chain.
The item after the item which is taken out will receive null pointer for the previous element! This will make the doubly linked list inconsistent and can lead to segfault or infinite loop when the doubly linked list is iterated later.
225810 No Perforce job exists for this issue. 1 32584
7 years, 28 weeks, 6 days ago
Reviewed
zookeeper perl zkperl Net-ZooKeeper 0|i05xuv:
ZooKeeper ZOOKEEPER-1379

'printwatches, redo, history and connect '. client commands always print usage. This is not necessary

Bug Closed Minor Fixed Edward Ribeiro kavita sharma kavita sharma 30/Jan/12 03:43   13/Mar/14 14:17 02/Sep/13 17:05 3.4.0 3.4.6, 3.5.0 java client   0 4   while executing the commands:
'printwatches, redo, history and connect usage is getting print
.basically we are printing usage if user has entered the command
wrong but in these commands case every time usage is getting print.
eg
{noformat}
[zk: localhost:2181(CONNECTED) 0] printwatches
printwatches is on
ZooKeeper -server host:port cmd args
connect host:port
get path [watch]
ls path [watch]
set path data [version]
delquota [-n|-b] path
quit
printwatches on|off
create [-s] [-e] path data acl
stat path [watch]
close
ls2 path [watch]
history
listquota path
setAcl path acl
getAcl path
sync path
redo cmdno
addauth scheme auth
delete path [version]
setquota -n|-b val path
{noformat}
225740 No Perforce job exists for this issue. 5 41984
6 years, 2 weeks ago 0|i07juv:
ZooKeeper ZOOKEEPER-1378

Provide option to turn off sending of diffs

Task Open Major Unresolved Unassigned Ted Yu Ted Yu 29/Jan/12 17:51   05/Feb/20 07:16     3.7.0, 3.5.8     0 4   From Patrick:
we need to have an option to turn off sending of diffs. There are a couple of really strong reasons I can think of to do this:

1) 3.3.x is broken in a similar way, there is an upgrade problem we can't solve short of having ppl first upgrade to a fixed 3.3 (3.3.5 say) and then upgrading to 3.4.x. If we could turn off diff sending this would address the problem.

2) safety valve. Say we find another new problem with diff sending in 3.4/3/5. Having an option to turn it off would be useful for people as a workaround until a fix is found and released.
225720 No Perforce job exists for this issue. 0 41985
4 years, 2 days ago 0|i07jv3:
ZooKeeper ZOOKEEPER-1377

add support for dumping a snapshot file content (similar to LogFormatter)

Improvement Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 27/Jan/12 13:45   18/Mar/12 18:26 18/Mar/12 18:26   3.4.4, 3.5.0 server   0 1   We have LogFormatter but not SnapshotFormatter. I've added this, patch momentarily. newbie 225592 No Perforce job exists for this issue. 2 33284
8 years, 1 week, 4 days ago 0|i0626f:
ZooKeeper ZOOKEEPER-1376

zkServer.sh does not correctly check for $SERVER_JVMFLAGS

Bug Resolved Minor Fixed Skye Wanderman-Milne Patrick D. Hunt Patrick D. Hunt 26/Jan/12 20:39   24/Sep/12 14:29 21/Sep/12 19:05 3.3.3, 3.3.4 3.3.7, 3.4.5 scripts   0 3   It will always include it even if not defined, although not much harm.

if [ "x$SERVER_JVMFLAGS" ]
then
JVMFLAGS="$SERVER_JVMFLAGS $JVMFLAGS"
fi

should use the std idiom.
newbie 225490 No Perforce job exists for this issue. 1 32585
7 years, 26 weeks, 6 days ago
Reviewed
0|i05xv3:
ZooKeeper ZOOKEEPER-1375

SendThread is exiting after OOMError

Bug Open Major Unresolved Unassigned Rakesh Radhakrishnan Rakesh Radhakrishnan 25/Jan/12 03:43   12/Sep/13 18:47   3.4.0       0 5   After reviewing the ClientCnxn code, there is still chances of exiting the SendThread without intimating the users. Say if client throws OOMError and entered into the throwable block. Here again while sending the Disconnected event, its creating "new WatchedEvent()" object.This will throw OOMError and leads to exit the SendThread without any Disconnected event notification.

{noformat}
try{
//...
} catch (Throwable e)
{
//..
cleanup();
if(state.isAlive()){
eventThread.queueEvent(
new WatchedEvent(Event.EventType.None, Event.KeeperState.Disconnected, null) )
}
//....
}
{noformat}
225232 No Perforce job exists for this issue. 0 32586
6 years, 28 weeks ago 0|i05xvb:
ZooKeeper ZOOKEEPER-1374

C client multi-threaded test suite fails to compile on ARM architectures.

Bug Resolved Minor Fixed James Page James Page James Page 24/Jan/12 10:22   28/Jun/16 04:37 06/Feb/12 04:54 3.3.4 3.4.3, 3.5.0 c client   0 3   Ubuntu 12.04 (precise) armel or armhf The multi-threaded test suite fails to build on ARM architectures:

g++ -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -D_FORTIFY_SOURCE=2 -DUSE_STATIC_LIB -DTHREADED -DZKSERVER_CMD="\"./tests/zkServer.sh\"" -Wall -g -MT zktest_mt-ThreadingUtil.o -MD -MP -MF .deps/zktest_mt-ThreadingUtil.Tpo -c -o zktest_mt-ThreadingUtil.o `test -f 'tests/ThreadingUtil.cc' || echo './'`tests/ThreadingUtil.cc
/tmp/ccqJWQRC.s: Assembler messages:
/tmp/ccqJWQRC.s:373: Error: bad instruction `lock xaddl r4,[r3,#0]'
/tmp/ccqJWQRC.s:425: Error: bad instruction `lock xchgl r4,[r3,#0]'

gcc does provide alternative primitives (_sync_*) which provide better cross platform compatibility; but that does make the assumption that a) gcc is being used or b) the primitives are provided by alternative compilers.

Tracked in Ubuntu here: https://bugs.launchpad.net/ubuntu/+source/zookeeper/+bug/920871
225124 No Perforce job exists for this issue. 2 32587
8 years, 7 weeks, 3 days ago
Reviewed
0|i05xvj:
ZooKeeper ZOOKEEPER-1373

Hardcoded SASL login context name clashes with Hadoop security configuration override

Bug Resolved Major Fixed Eugene Joseph Koontz Thomas Weise Thomas Weise 23/Jan/12 22:40   01/May/13 22:29 06/Feb/12 03:37 3.4.2 3.4.3, 3.5.0 java client   0 4   I'm trying to configure a process with Hadoop security (Hive metastore server) to talk to ZooKeeper 3.4.2 with Kerberos authentication. In this scenario Hadoop controls the SASL configuration (org.apache.hadoop.security.UserGroupInformation.HadoopConfiguration), instead of setting up the ZooKeeper "Client" loginContext via jaas.conf and system property

{{-Djava.security.auth.login.config}}

Using the Hadoop configuration would work, except that ZooKeeper client code expects the loginContextName to be "Client" while Hadoop security will use "hadoop-keytab-kerberos". I verified that by changing the name in the debugger the SASL authentication succeeds while otherwise the login configuration cannot be resolved and the connection to ZooKeeper is unauthenticated.

To integrate with Hadoop, the following in ZooKeeperSaslClient would need to change to make the name configurable:

{{login = new Login("Client",new ClientCallbackHandler(null));}}
225065 No Perforce job exists for this issue. 7 32588
8 years, 7 weeks, 2 days ago 0|i05xvr:
ZooKeeper ZOOKEEPER-1372

stat reports inconsistent zxids across servers after a leader change

Bug Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 23/Jan/12 20:33   23/Jan/12 20:33   3.4.2   quorum   0 1   I started a 2 server ensemble, made some changes to znodes, then shutdown the cluster.

I then removed the datadir from the original leader.

I then restarted the entire ensemble.

after this the new leader has a zxid of 0x400000000 while the follower reported a zxid of 0x300000007 (the last zxid of the old epoch). This was via stat.

I then connected a client to the ensemble, subsequent to which the zxid was again in sync. The data all seemed fine, but stat was reporting invalid information until a client connected.

225059 No Perforce job exists for this issue. 0 32589
8 years, 9 weeks, 2 days ago 0|i05xvz:
ZooKeeper ZOOKEEPER-1371

Remove dependency on log4j in the source code.

Bug Closed Major Fixed Mohammad Arshad Mahadev Konar Mahadev Konar 23/Jan/12 19:30   24/Feb/20 20:21 21/Nov/15 16:21 3.4.0, 3.4.1, 3.4.2, 3.4.3 3.5.2, 3.6.0     6 24   ZOOKEEPER-850 added slf4j to ZK. We still depend on log4j in our codebase. We should remove the dependency on log4j so that we can make logging pluggable.
patch 225049 No Perforce job exists for this issue. 5 32590
2 years, 50 weeks, 2 days ago 0|i05xw7:
ZooKeeper ZOOKEEPER-1370

Add logging changes in Release Notes needed for clients because of ZOOKEEPER-850.

Bug Resolved Major Fixed Mahadev Konar Mahadev Konar Mahadev Konar 23/Jan/12 19:28   06/Feb/12 05:29 06/Feb/12 05:29   3.4.3     0 1   225048 No Perforce job exists for this issue. 0 32591
8 years, 7 weeks, 3 days ago 0|i05xwf:
ZooKeeper ZOOKEEPER-1369

Mock access to time-related methods

Improvement Open Major Unresolved Unassigned Henry Robinson Henry Robinson 23/Jan/12 15:47   23/Jan/12 15:47           0 0   As we began to discuss in ZOOKEEPER-1366, it would be great to have the ability to mock out time methods anywhere to help with deterministic, more efficient testing.

The general idea is to have a 'mock clock' that any thread can interact with as though it were the real clock. Time would typically be advanced by an independent thread of control (normally the thread that the test is running in).

There are two main method calls that interact with the JVM clock:

# {{System.currentTimeMillis}} - very easy to mock
# {{Thread.sleep}} - slightly harder, since the mock clock would need to keep an ordered list of threads that need to be woken up and release a barrier for each one as time was advanced.

Other implicit methods, such as setting the socket rx timeout, are probably too hard to mock and are out of scope for this ticket.
225028 No Perforce job exists for this issue. 0 41986
8 years, 9 weeks, 3 days ago 0|i07jvb:
ZooKeeper ZOOKEEPER-1368

zookeeper c client get apis crash if zhandle is null

Bug Open Major Unresolved Unassigned Marc Celani Marc Celani 21/Jan/12 23:06   23/Jan/12 15:11       c client   0 2 604800 604800 0% Although wget, awget, wexists, awexists, wgetchildren, awgetchildren will return ZBADARGUMENTS when zh is null, the get APIs will crash if you request a watch, as they dereference the zh without checking for null in order to get the watch function. 0% 0% 604800 604800 newbie 224832 No Perforce job exists for this issue. 0 32592
8 years, 9 weeks, 4 days ago 0|i05xwn:
ZooKeeper ZOOKEEPER-1367

Data inconsistencies and unexpired ephemeral nodes after cluster restart

Bug Resolved Blocker Fixed Benjamin Reed Jeremy Stribling Jeremy Stribling 20/Jan/12 13:48   28/Aug/13 18:20 31/Jan/12 01:56 3.4.2 3.4.3, 3.3.5, 3.5.0 server   0 9   Debian Squeeze, 64-bit In one of our tests, we have a cluster of three ZooKeeper servers. We kill all three, and then restart just two of them. Sometimes we notice that on one of the restarted servers, ephemeral nodes from previous sessions do not get deleted, while on the other server they do. We are effectively running 3.4.2, though technically we are running 3.4.1 with the patch manually applied for ZOOKEEPER-1333 and a C client for 3.4.1 with the patches for ZOOKEEPER-1163.

I noticed that when I connected using zkCli.sh to the first node (90.0.0.221, zkid 84), I saw only one znode in a particular path:

{quote}
[zk: 90.0.0.221:2888(CONNECTED) 0] ls /election/zkrsm
[nominee0000000011]
[zk: 90.0.0.221:2888(CONNECTED) 1] get /election/zkrsm/nominee0000000011
90.0.0.222:7777
cZxid = 0x400000027
ctime = Thu Jan 19 08:18:24 UTC 2012
mZxid = 0x400000027
mtime = Thu Jan 19 08:18:24 UTC 2012
pZxid = 0x400000027
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0xa234f4f3bc220001
dataLength = 16
numChildren = 0
{quote}

However, when I connect zkCli.sh to the second server (90.0.0.222, zkid 251), I saw three znodes under that same path:

{quote}
[zk: 90.0.0.222:2888(CONNECTED) 2] ls /election/zkrsm
nominee0000000006 nominee0000000010 nominee0000000011
[zk: 90.0.0.222:2888(CONNECTED) 2] get /election/zkrsm/nominee0000000011
90.0.0.222:7777
cZxid = 0x400000027
ctime = Thu Jan 19 08:18:24 UTC 2012
mZxid = 0x400000027
mtime = Thu Jan 19 08:18:24 UTC 2012
pZxid = 0x400000027
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0xa234f4f3bc220001
dataLength = 16
numChildren = 0
[zk: 90.0.0.222:2888(CONNECTED) 3] get /election/zkrsm/nominee0000000010
90.0.0.221:7777
cZxid = 0x30000014c
ctime = Thu Jan 19 07:53:42 UTC 2012
mZxid = 0x30000014c
mtime = Thu Jan 19 07:53:42 UTC 2012
pZxid = 0x30000014c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0xa234f4f3bc220000
dataLength = 16
numChildren = 0
[zk: 90.0.0.222:2888(CONNECTED) 4] get /election/zkrsm/nominee0000000006
90.0.0.223:7777
cZxid = 0x200000cab
ctime = Thu Jan 19 08:00:30 UTC 2012
mZxid = 0x200000cab
mtime = Thu Jan 19 08:00:30 UTC 2012
pZxid = 0x200000cab
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x5434f5074e040002
dataLength = 16
numChildren = 0
{quote}

These never went away for the lifetime of the server, for any clients connected directly to that server. Note that this cluster is configured to have all three servers still, the third one being down (90.0.0.223, zkid 162).

I captured the data/snapshot directories for the the two live servers. When I start single-node servers using each directory, I can briefly see that the inconsistent data is present in those logs, though the ephemeral nodes seem to get (correctly) cleaned up pretty soon after I start the server.

I will upload a tar containing the debug logs and data directories from the failure. I think we can reproduce it regularly if you need more info.
224696 No Perforce job exists for this issue. 6 32593
6 years, 30 weeks, 1 day ago Fix Data inconsistencies and unexpired ephemeral nodes after cluster restart.
Reviewed
0|i05xwv:
ZooKeeper ZOOKEEPER-1366

Zookeeper should be tolerant of clock adjustments

Bug Resolved Critical Fixed Hongchao Deng Ted Dunning Ted Dunning 19/Jan/12 02:00   11/May/17 16:34 06/Feb/15 00:19   3.5.1, 3.6.0     7 32   ZOOKEEPER-1626 If you want to wreak havoc on a ZK based system just do [date -s "+1hour"] and watch the mayhem as all sessions expire at once.

This shouldn't happen. Zookeeper could easily know handle elapsed times as elapsed times rather than as differences between absolute times. The absolute times are subject to adjustment when the clock is set while a timer is not subject to this problem. In Java, System.currentTimeMillis() gives you absolute time while System.nanoTime() gives you time based on a timer from an arbitrary epoch.

I have done this and have been running tests now for some tens of minutes with no failures. I will set up a test machine to redo the build again on Ubuntu and post a patch here for discussion.
224432 No Perforce job exists for this issue. 18 32594
5 years, 6 weeks, 6 days ago 0|i05xx3:
ZooKeeper ZOOKEEPER-1365

Removing a duplicate function and another minor cleanup in QuorumPeer.java

Improvement Resolved Trivial Not A Problem Alexander Shraer Alexander Shraer Alexander Shraer 18/Jan/12 20:11   19/Jan/12 20:12 19/Jan/12 20:12     server   0 0   - getMyId() and getId() in QuorumPeer are doing the same thing
- QuorumPeer.quorumPeers is being read directly from outside QuorumPeer, although we have the getter QuorumPeers.getView().

The purpose of this cleanup is to later be able to change more easily the way QuorumPeer manages its list of peers (to support dynamic changes in this list).
224410 No Perforce job exists for this issue. 2 33285
8 years, 9 weeks, 6 days ago 0|i0626n:
ZooKeeper ZOOKEEPER-1364

Add orthogonal fault injection mechanism/framework

Test Open Major Unresolved Andrei Savu Andrei Savu Andrei Savu 17/Jan/12 10:56   14/Jan/20 05:56       tests   0 3 0 3600   Hadoop has a mechanism for doing fault injection (HDFS-435). I think it would be useful if something similar would be available for ZooKeeper. 100% 100% 3600 0 pull-request-available 224149 No Perforce job exists for this issue. 0 41987
3 years, 12 weeks, 2 days ago 0|i07jvj:
ZooKeeper ZOOKEEPER-1363

Categorise unit tests by 'test-commit', 'full-test' etc

Improvement Resolved Major Won't Fix Mark Fenes Henry Robinson Henry Robinson 17/Jan/12 01:33   23/Mar/18 10:43 23/Mar/18 10:43     build, tests   0 2   As discussed on the list, it would be good to split the Java test suite into categories so that it's easy to run a small set of unit tests against a patch, and to leave Jenkins to run the full suite of stress tests etc. newbie 224104 No Perforce job exists for this issue. 0 41988
1 year, 51 weeks, 6 days ago 0|i07jvr:
ZooKeeper ZOOKEEPER-1362

ZooDefs.Ids ACL lists not immutable

Improvement Open Trivial Unresolved Unassigned Tassos Souris Tassos Souris 16/Jan/12 14:51   23/Nov/16 11:52       java client   0 3   In org.apache.zookeeper:
1) ZooDefs.Ids.OPEN_ACL_UNSAFE
2) ZooDefs.Ids.CREATOR_ALL_ACL
3) ZooDefs.Ids.READ_ALL_ACL
are not immutable lists. Unlikely but the client could alter them.
224068 No Perforce job exists for this issue. 0 41989
3 years, 17 weeks, 1 day ago 0|i07jvz:
ZooKeeper ZOOKEEPER-1361

Leader.lead iterates over 'learners' set without proper synchronisation

Bug Resolved Major Fixed Henry Robinson Henry Robinson Henry Robinson 13/Jan/12 12:43   17/Sep/12 01:04 17/Sep/12 01:04 3.4.2 3.4.4, 3.5.0     0 5   This block:

{code}
HashSet<Long> followerSet = new HashSet<Long>();
for(LearnerHandler f : learners)
followerSet.add(f.getSid());
{code}

is executed without holding the lock on learners, so if there were ever a condition where a new learner was added during the initial sync phase, I'm pretty sure we'd see a concurrent modification exception. Certainly other parts of the code are very careful to lock on learners when iterating.

It would be nice to use a {{ConcurrentHashMap}} to hold the learners instead, but I can't convince myself that this wouldn't introduce some correctness bugs. For example the following:

Learners contains A, B, C, D
Thread 1 iterates over learners, and gets as far as B.
Thread 2 removes A, and adds E.
Thread 1 continues iterating and sees a learner view of A, B, C, D, E

This may be a bug if Thread 1 is counting the number of synced followers for a quorum count, since at no point was A, B, C, D, E a correct view of the quorum.

In practice, I think this is actually ok, because I don't think ZK makes any strong ordering guarantees on learners joining or leaving (so we don't need a strong serialisability guarantee on learners) but I don't think I'll make that change for this patch. Instead I want to clean up the locking protocols on the follower / learner sets - to avoid another easy deadlock like the one we saw in ZOOKEEPER-1294 - and to do less with the lock held; i.e. to copy and then iterate over the copy rather than iterate over a locked set.
223846 No Perforce job exists for this issue. 5 32595
7 years, 27 weeks, 3 days ago 0|i05xxb:
ZooKeeper ZOOKEEPER-1360

QuorumTest.testNoLogBeforeLeaderEstablishment has several problems

Bug Open Major Unresolved Abraham Fine Henry Robinson Henry Robinson 12/Jan/12 02:44   05/Feb/20 07:16   3.4.2 3.7.0, 3.5.8 tests   0 4   After the apparently valid fix to ZOOKEEPER-1294, testNoLogBeforeLeaderEstablishment is failing for me about one time in four. While I'll investigate whether the patch is 1294 is ultimately to blame, reading the test brought to light a number of issues that appear to be bugs or in need of improvement:

* As part of QuorumTest, an ensemble is already established by the fixture setup code, but apparently unused by the test which uses QuorumUtil.
* The test reads QuorumPeer.leader and QuorumPeer.follower without synchronization, which means that writes to those fields may not be published when we come to read them.
* The return value of sem.tryAcquire is never checked.
* The progress of the test is based on ad-hoc timings (25 * 500ms sleeps) and inscrutable numbers of iterations through the main loop (e.g. the semaphore blocking the final asserts is released only after the 20000th of 50000 callbacks)
* The test as a whole takes ~30s to run

The first three are easy to fix (as part of fixing the second, I intend to hide all members of QuorumPeer behind getters and setters), the fourth and fifth need a slightly deeper understanding of what the test is trying to achieve.
223665 No Perforce job exists for this issue. 0 32596
2 years, 31 weeks, 3 days ago 0|i05xxj:
ZooKeeper ZOOKEEPER-1359

ZkCli create command data and acl parts should be optional.

Bug Resolved Trivial Duplicate Unassigned kavita sharma kavita sharma 10/Jan/12 03:46   01/Jul/13 17:37 16/Dec/12 01:50     java client   0 5   In zkCli if we create a node without data then also node is getting created but if we will see in the commandMap
it shows that
{noformat}
commandMap.put("create", "[-s] [-e] path data acl");
{noformat}
that means data and acl parts are not optional .we need to change these parts as optional.
new 223378 No Perforce job exists for this issue. 0 32597
6 years, 38 weeks, 3 days ago 0|i05xxr:
ZooKeeper ZOOKEEPER-1358

In StaticHostProviderTest.java, testNextDoesNotSleepForZero tests that hostProvider.next(0) doesn't sleep by checking that the latency of this call is less than 10sec

Bug Resolved Trivial Fixed Alexander Shraer Alexander Shraer Alexander Shraer 09/Jan/12 20:46   15/Jan/12 22:56 15/Jan/12 21:20   3.5.0     0 1   should check for something smaller, perhaps 1ms or 5ms 223356 No Perforce job exists for this issue. 2 32598
8 years, 10 weeks, 3 days ago 0|i05xxz:
ZooKeeper ZOOKEEPER-1357

Zab1_0Test uses hard-wired port numbers. Specifically, it uses the same port for leader in two different tests. The second test periodically fails complaining that the port is still in use.

Bug Resolved Minor Fixed Alexander Shraer Alexander Shraer Alexander Shraer 09/Jan/12 18:04   14/Apr/14 18:31 14/Apr/14 17:53 3.5.0 3.5.0 tests   0 4   Here's what I get:


Testcase: testLeaderInConnectingFollowers took 34.117 sec
Testcase: testLastAcceptedEpoch took 0.047 sec <----- new test added in ZK-1343
Testcase: testLeaderInElectingFollowers took 0.004 sec
Caused an ERROR
Address already in use
java.net.BindException: Address already in use
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:383)
at java.net.ServerSocket.bind(ServerSocket.java:328)
at java.net.ServerSocket.<init>(ServerSocket.java:194)
at java.net.ServerSocket.<init>(ServerSocket.java:106)
at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:220)
at org.apache.zookeeper.server.quorum.Zab1_0Test.createLeader(Zab1_0Test.java:711)
at org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderInElectingFollowers(Zab1_0Test.java:225)

Testcase: testNormalFollowerRun took 29.128 sec
Testcase: testNormalRun took 25.158 sec
Testcase: testLeaderBehind took 25.148 sec
Testcase: testAbandonBeforeACKEpoch took 34.029 sec


My guess is that testLastAcceptedEpoch doesn't properly close the connection before testLeaderInElectingFollowers starts.
I propose to add

if (leadThread != null) {
leadThread.interrupt();
leadThread.join();
}

to the test.


In addition, I propose to change the hard-wired ports in Zab1_0Test to use Portassignment.unique() as done in other tests. If I understand correctly the static counter used in unique() to assign ports is initialized once per test file, so it would also prevent the problem I'm seeing here of two tests in the same file trying to use the same port.

The error can be reproduced using the attached patch (for some reason I don't see the problem in the trunk).

223337 No Perforce job exists for this issue. 2 32599
5 years, 49 weeks, 3 days ago 0|i05xy7:
ZooKeeper ZOOKEEPER-1356

Avoid permanent caching of server IPs in the client

Bug Resolved Major Duplicate Neha Narkhede Neha Narkhede Neha Narkhede 09/Jan/12 17:42   05/Feb/16 04:54 10/Jan/12 11:39 3.3.4, 3.4.2   java client   0 3   Relevant conversation on the dev mailing list - https://email.corp.linkedin.com/owa/redir.aspx?C=87f3d1e78c96438c8115e450f410d010&URL=http%3a%2f%2fmarkmail.org%2fmessage%2f3vzynx6rgurubf3p%3fq%3dPerforming%2bno%2bdowntime%2bhardware%2bchanges%2bto%2ba%2blive%2bzookeeper%2bcluster%2blist%3aorg%252Eapache%252Ehadoop%252Ezookeeper-dev

Basically, the client caches the list of server IPs internally and maintains that list for the entire lifetime of the client. This limits the ability to remove/change a server node from a zookeeper cluster, without having to restart every client. Also, two levels of IP caching, one in the JVM and one in the zookeeper client code seems unnecessar.

It would be ideal to provide a config option that would turn off this IP caching in the client and re-resolve the host names during the reconnect.
223333 No Perforce job exists for this issue. 0 32600
4 years, 6 weeks, 6 days ago 0|i05xyf:
ZooKeeper ZOOKEEPER-1355

Add zk.updateServerList(newServerList)

New Feature Resolved Major Fixed Alexander Shraer Alexander Shraer Alexander Shraer 09/Jan/12 17:16   13/Feb/20 14:05 17/Nov/12 09:03   3.5.0 c client, java client   3 13   When the set of servers changes, we would like to update the server list stored by clients without restarting the clients.
Moreover, assuming that the number of clients per server is the same (in expectation) in the old configuration (as guaranteed by the current list shuffling for example), we would like to re-balance client connections across the new set of servers in a way that a) the number of clients per server is the same for all servers (in expectation) and b) there is no excessive/unnecessary client migration.

It is simple to achieve (a) without (b) - just re-shuffle the new list of servers at every client. But this would create unnecessary migration, which we'd like to avoid.

We propose a simple probabilistic migration scheme that achieves (a) and (b) - each client locally decides whether and where to migrate when the list of servers changes. The attached document describes the scheme and shows an evaluation of it in Zookeeper. We also implemented re-balancing through a consistent-hashing scheme and show a comparison. We derived the probabilistic migration rules from a simple formula that we can also provide, if someone's interested in the proof.
223330 No Perforce job exists for this issue. 35 2598
7 years, 18 weeks, 5 days ago
Reviewed
0|i00ssv:
ZooKeeper ZOOKEEPER-1354

AuthTest.testBadAuthThenSendOtherCommands fails intermittently

Bug Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 06/Jan/12 20:10   01/Mar/12 22:21 01/Mar/12 22:05 3.4.0 3.4.4, 3.5.0 tests   0 1   I'm seeing the following intermittent failure:

{noformat}
junit.framework.AssertionFailedError: Should have called my watcher expected:<1> but was:<0>
at org.apache.zookeeper.test.AuthTest.testBadAuthThenSendOtherCommands(AuthTest.java:89)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
{noformat}

The following commit introduced this test:

bq. ZOOKEEPER-1152. Exceptions thrown from handleAuthentication can cause buffer corruption issues in NIOServer. (camille via breed)

+ Assert.assertEquals("Should have called my watcher",
+ 1, authFailed.get());

I think it's due to either a) the code is not waiting for the
notification to be propagated, or 2) the message doesn't make it back
from the server to the client prior to the socket or the clientcnxn
being closed.

What do you think, should I just wait for the notification to arrive? or do you think it's 2). ?

223123 No Perforce job exists for this issue. 1 12513
8 years, 3 weeks, 6 days ago 0|i02hz3:
ZooKeeper ZOOKEEPER-1353

C client test suite fails consistently

Bug Resolved Minor Fixed Clint Byrum Clint Byrum Clint Byrum 06/Jan/12 16:42   06/Feb/12 05:58 06/Feb/12 03:00 3.3.4 3.4.3, 3.3.5, 3.5.0 c client, tests   0 2 300 300 0% Ubuntu precise (dev release), amd64 When the c client test suite, zktest-mt, is run, it fails with this:

tests/TestZookeeperInit.cc:233: Assertion: equality assertion failed [Expected: 2, Actual : 22]

This was also reported in 3.3.1 here:

http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08914.html

The C client tests are making some assumptions that are not valid. getaddrinfo may have, at one time, returned ENOENT instead of EINVAL for the host given in the test. The assertion should simply be that EINVAL | ENOENT are given, so that builds on platforms which return ENOENT for this are not broken.

0% 0% 300 300 patch, test 223107 No Perforce job exists for this issue. 2 32601
8 years, 7 weeks, 3 days ago
Reviewed
0|i05xyn:
ZooKeeper ZOOKEEPER-1352

server.InvalidSnapshotTest is using connection timeouts that are too short

Bug Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 05/Jan/12 14:12   06/Feb/12 05:58 06/Feb/12 03:52 3.3.4 3.4.3, 3.3.5, 3.5.0 tests   0 1   InvalidSnapshotTest is using connection timeouts that are too short, see this false failure:
https://builds.apache.org/job/ZooKeeper_branch33_solaris/65/testReport/junit/org.apache.zookeeper.server/InvalidSnapshotTest/testInvalidSnapshot/

{noformat}
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /invalidsnap-0
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643)
at org.apache.zookeeper.server.InvalidSnapshotTest.testInvalidSnapshot(InvalidSnapshotTest.java:71)
{noformat}

Also in looking at the test itself it could use some cleanup (reuse features from ClientBase test utils)
222894 No Perforce job exists for this issue. 4 32602
8 years, 7 weeks, 3 days ago
Reviewed
0|i05xyv:
ZooKeeper ZOOKEEPER-1351

invalid test verification in MultiTransactionTest

Bug Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 04/Jan/12 17:55   15/Jan/12 22:56 15/Jan/12 21:35 3.4.0 3.4.3, 3.5.0 tests   0 1   tests such as org.apache.zookeeper.test.MultiTransactionTest.testWatchesTriggered() are incorrect. Two issues I see

1) zk.sync is async, there is no guarantee that the watcher will be called subsequent to sync returning

{noformat}
zk.sync("/", null, null);
assertTrue(watcher.triggered); /// incorrect assumption
{noformat}

The callback needs to be implemented, only once the callback is called can we verify the trigger.

2) trigger is not declared as volatile, even though it will be set in the context of a different thread (eventthread)

See https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-solaris/91/testReport/junit/org.apache.zookeeper.test/MultiTransactionTest/testWatchesTriggered/
for an example of a false positive failure

{noformat}
junit.framework.AssertionFailedError
at org.apache.zookeeper.test.MultiTransactionTest.testWatchesTriggered(MultiTransactionTest.java:236)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
{noformat}
222765 No Perforce job exists for this issue. 2 32603
8 years, 10 weeks, 3 days ago 0|i05xz3:
ZooKeeper ZOOKEEPER-1350

Please make JMX registration optional in LearnerZooKeeperServer

Improvement Patch Available Major Unresolved Jordan Zimmerman Jordan Zimmerman Jordan Zimmerman 03/Jan/12 17:36   05/Feb/20 07:11   3.4.0 3.7.0, 3.5.8 server   4 5   LearnerZooKeeperServer has no option to disable JMX registrations. Curator has a test ZK server cluster. Due to the intricacies of JMX, the registrations cannot be easily undone. In order for the Curator Test cluster to be re-usable in a testing session, JavaAssist ugliness was necessary to make LearnerZooKeeperServer.registerJMX() and LearnerZooKeeperServer.unregisterJMX() NOPs.

I suggest a simple System property.
222617 No Perforce job exists for this issue. 4 2510
1 year, 45 weeks ago 0|i00s9b:
ZooKeeper ZOOKEEPER-1349

Support starting zkCli.sh in readonly mode

Improvement Resolved Major Duplicate Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 02/Jan/12 10:10   11/Mar/14 04:35 11/Mar/14 04:35 3.4.0   java client   1 3   Start the .zkCli.sh in readonly mode. ZooKeeper client is supporting the readonly mode, it would be desirable the admin shell is providing the same support and can be able to see the status.

Suggestion:-
Add one more parameter as follows specifying the r-o mode.

./zkCli.sh -server 10.18.52.144:2179:readonly
222497 No Perforce job exists for this issue. 0 41990
6 years, 2 weeks, 2 days ago 0|i07jw7:
ZooKeeper ZOOKEEPER-1348

Zookeeper 3.4.2 C client incorrectly reports string version of 3.4.1

Bug Resolved Major Fixed Mahadev Konar Marshall McMullen Marshall McMullen 30/Dec/11 21:58   06/Feb/12 03:20 06/Feb/12 03:20 3.4.2 3.4.3 c client   0 1   When running the 3.4.2 C client, it shows the following output:

Client environment:zookeeper.version=zookeeper C client 3.4.1

This should show "3.4.2" not "3.4.1". The problem looks to be caused by stale autoconf files in the C directory.

grep -R "zookeeper C client 3.4.1" *

autom4te.cache/output.0:@%:@ Generated by GNU Autoconf 2.59 for zookeeper C client 3.4.1.
autom4te.cache/output.0:PACKAGE_STRING='zookeeper C client 3.4.1'
autom4te.cache/output.0:\`configure' configures zookeeper C client 3.4.1 to adapt to many kinds of systems.
autom4te.cache/output.0: short | recursive ) echo "Configuration of zookeeper C client 3.4.1:";;
autom4te.cache/output.1:@%:@ Generated by GNU Autoconf 2.59 for zookeeper C client 3.4.1.
autom4te.cache/output.1:PACKAGE_STRING='zookeeper C client 3.4.1'
autom4te.cache/output.1:\`configure' configures zookeeper C client 3.4.1 to adapt to many kinds of systems.
autom4te.cache/output.1: short | recursive ) echo "Configuration of zookeeper C client 3.4.1:";;
config.h:#define PACKAGE_STRING "zookeeper C client 3.4.1"
config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1"
config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1"
config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1"
config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1"
config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1"
config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1"
config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1"
config.log:PACKAGE_STRING='zookeeper C client 3.4.1'
config.log:#define PACKAGE_STRING "zookeeper C client 3.4.1"
config.status:s,@PACKAGE_STRING@,zookeeper C client 3.4.1,;t t
config.status:${ac_dA}PACKAGE_STRING${ac_dB}PACKAGE_STRING${ac_dC}"zookeeper C client 3.4.1"${ac_dD}
config.status:${ac_uA}PACKAGE_STRING${ac_uB}PACKAGE_STRING${ac_uC}"zookeeper C client 3.4.1"${ac_uD}
configure:# Generated by GNU Autoconf 2.59 for zookeeper C client 3.4.1.
configure:PACKAGE_STRING='zookeeper C client 3.4.1'
configure:\`configure' configures zookeeper C client 3.4.1 to adapt to many kinds of systems.
configure: short | recursive ) echo "Configuration of zookeeper C client 3.4.1:";;
Binary file libzkmt_la-zookeeper.o matches
Makefile:PACKAGE_STRING = zookeeper C client 3.4.1
222399 No Perforce job exists for this issue. 0 32604
8 years, 7 weeks, 3 days ago 0|i05xzb:
ZooKeeper ZOOKEEPER-1347

ZOOKEEPER-1346 Fix the cnxns to use a concurrent data structures

Sub-task Open Major Unresolved Unassigned Camille Fournier Camille Fournier 29/Dec/11 18:09   05/Feb/20 07:16     3.7.0, 3.5.8 server   0 3   Cnxns is currently stored as a HashSet but may be accessed by multiple threads concurrently. Instead of doing our own sync we should investigate using a proper concurrent data structure for this. 222307 No Perforce job exists for this issue. 0 41991
5 years, 37 weeks, 5 days ago 0|i07jwf:
ZooKeeper ZOOKEEPER-1346

Add Jetty HTTP server support for four letter words.

Improvement Resolved Major Fixed Bill Havanki Camille Fournier Camille Fournier 29/Dec/11 18:07   11/May/17 21:48 17/Jul/14 20:20   3.5.0 server   1 15   ZOOKEEPER-1347 Move the 4lws to their own port, off of the client port, and support them properly via long-lived sessions instead of polling. Deprecate the 4lw support on the client port. Will enable us to enhance the functionality of the commands via extended command syntax, address security concerns and fix bugs involving the socket close being received before all of the data on the client end. 222305 No Perforce job exists for this issue. 10 4493
2 years, 44 weeks, 6 days ago
Reviewed
0|i014gv:
ZooKeeper ZOOKEEPER-1345

Add a .gitignore file with general exclusions and Eclipse project files excluded

Improvement Resolved Trivial Fixed Harsh J Harsh J Harsh J 29/Dec/11 11:29   31/Dec/11 05:57 30/Dec/11 16:36 3.5.0 3.4.3, 3.3.5, 3.5.0 build   0 1   I tried looking for an .gitignore file in the ZK sources but I could not find one.

Preferably, we could add one with the following:

{code}
# .classpath
# .eclipse/
# .project
# .revision/
# .settings/
# build/
# src/c/generated/
# src/java/generated/
# src/java/lib/ant-eclipse-1.0-jvm1.2.jar
# src/java/lib/ivy-2.2.0.jar
{code}

To avoid losing much when doing "git clean -fd" and the likes while cleaning up the working repo dirs during development. This will aid those who use git mirrors for contributions a lot.
222275 No Perforce job exists for this issue. 1 33286
8 years, 12 weeks, 5 days ago
Reviewed
0|i0626v:
ZooKeeper ZOOKEEPER-1344

ZooKeeper client multi-update command is not considering the Chroot request

Bug Resolved Critical Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 26/Dec/11 08:36   18/Mar/12 00:54 16/Mar/12 20:32 3.4.0 3.4.4, 3.5.0 java client   0 3   For example:
I have created a ZooKeeper client with subtree as "10.18.52.144:2179/apps/X". Now just generated OP command for the creation of zNode "/myId". When the client creates the path "/myid", the ZooKeeper server is actually be creating the path as "/myid" instead of creating as "/apps/X/myid"

Expected output: zNode has to be created as "/apps/X/myid"
222059 No Perforce job exists for this issue. 5 32605
8 years, 1 week, 4 days ago
Incompatible change
0|i05xzj:
ZooKeeper ZOOKEEPER-1343

getEpochToPropose should check if lastAcceptedEpoch is greater or equal than epoch

Bug Resolved Critical Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 23/Dec/11 07:44   09/Jan/12 17:50 03/Jan/12 19:28 3.4.0 3.4.3, 3.5.0     0 1   The following block in Leader.getEpochToPropose:

{noformat}
if (lastAcceptedEpoch > epoch) {
epoch = lastAcceptedEpoch+1;
}
{noformat}

needs to be fixed, since it doesn't increment the epoch variable in the case epoch != -1 (initial value) and lastAcceptedEpoch is equal. The fix trivial and corresponds to changing > with >=.
221962 No Perforce job exists for this issue. 4 32606
8 years, 11 weeks, 3 days ago
Reviewed
0|i05xzr:
ZooKeeper ZOOKEEPER-1342

quorum Listener & LearnerCnxAcceptor are missing thread names

Improvement Resolved Minor Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 23/Dec/11 00:15   22/Apr/13 16:02 27/Dec/11 19:29   3.5.0 quorum   0 2   derby_triage10_5_2 221922 No Perforce job exists for this issue. 2 33287
8 years, 13 weeks, 1 day ago
Reviewed
0|i06273:
ZooKeeper ZOOKEEPER-1341

problem handling invalid multi op in processTxn

Bug Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 22/Dec/11 14:55   22/Dec/11 14:55   3.4.0   server   0 0   The handling of an invalid multi op in org.apache.zookeeper.server.DataTree.processTxn(TxnHeader, Record) is unusual, looks wrong to me.

In particular an IOException is thrown and then essentially ignored, it seems to me we should fail the operation properly instead. This will be more important if we add new op types going fwd.

Use of assert is a bit suspect as well, however perhaps it's fine... not sure. (we don't explicitly turn on assertions in our tests so not sure how useful it is regardless)

Also notice that the catch of IOException is ignoring the result. It seems to me that handling this exception should be localized to the multi block (separate it out to it's own method seems like a good idea).

We should add a test for this case.
221885 No Perforce job exists for this issue. 0 32607
8 years, 14 weeks ago 0|i05xzz:
ZooKeeper ZOOKEEPER-1340

multi problem - typical user operations are generating ERROR level messages in the server

Bug Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 22/Dec/11 13:03   06/Feb/12 05:58 06/Feb/12 04:50 3.4.0 3.4.3, 3.5.0 server   0 1   Multi operations run by users are generating ERROR level messages in the server log even though they are typical user level operations that are not in any way impacting the server, example:

{noformat}
2011-12-22 09:55:06,538 [myid:] - ERROR [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@545] - >>>> Got user-level KeeperException when processing sessionid:0x13466e9828c0000 type:multi cxid:0x3 zxid:0x2 txntype:2 reqpath:n/a Error Path:/nonexisting Error:KeeperErrorCode = NoNode for /nonexisting
2011-12-22 09:55:06,538 [myid:] - ERROR [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@549] - >>>> ABORTING remaing MultiOp ops
{noformat}

This is misleading. We should demote these messages to INFO level at the highest. (this is what we do for other such user operations, e.g. nonode)
221866 No Perforce job exists for this issue. 2 32608
8 years, 7 weeks, 3 days ago Unwanted ERROR messages in the logs. 0|i05y07:
ZooKeeper ZOOKEEPER-1339

C clien doesn't build with --enable-debug

Bug Resolved Major Fixed Eric Liang Jakub Lekstan Jakub Lekstan 22/Dec/11 05:00   08/May/12 14:04 08/May/12 12:39 3.4.1 3.3.6, 3.4.4, 3.5.0 c client   0 3   Ubuntu 11.04 When I'm trying to build 3.4.1 c client with --enable-debug switch I'm getting following error:

{code}
make all-am
make[1]: Entering directory `/home/jlekstan/zookeeper-3.4.1/src/c'
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF ".deps/zookeeper.Tpo" -c -o zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c; \
then mv -f ".deps/zookeeper.Tpo" ".deps/zookeeper.Plo"; else rm -f ".deps/zookeeper.Tpo"; exit 1; fi
mkdir .libs
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fPIC -DPIC -o .libs/zookeeper.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -o zookeeper.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT recordio.lo -MD -MP -MF ".deps/recordio.Tpo" -c -o recordio.lo `test -f 'src/recordio.c' || echo './'`src/recordio.c; \
then mv -f ".deps/recordio.Tpo" ".deps/recordio.Plo"; else rm -f ".deps/recordio.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT recordio.lo -MD -MP -MF .deps/recordio.Tpo -c src/recordio.c -fPIC -DPIC -o .libs/recordio.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT recordio.lo -MD -MP -MF .deps/recordio.Tpo -c src/recordio.c -o recordio.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.jute.lo -MD -MP -MF ".deps/zookeeper.jute.Tpo" -c -o zookeeper.jute.lo `test -f 'generated/zookeeper.jute.c' || echo './'`generated/zookeeper.jute.c; \
then mv -f ".deps/zookeeper.jute.Tpo" ".deps/zookeeper.jute.Plo"; else rm -f ".deps/zookeeper.jute.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.jute.lo -MD -MP -MF .deps/zookeeper.jute.Tpo -c generated/zookeeper.jute.c -fPIC -DPIC -o .libs/zookeeper.jute.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.jute.lo -MD -MP -MF .deps/zookeeper.jute.Tpo -c generated/zookeeper.jute.c -o zookeeper.jute.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_log.lo -MD -MP -MF ".deps/zk_log.Tpo" -c -o zk_log.lo `test -f 'src/zk_log.c' || echo './'`src/zk_log.c; \
then mv -f ".deps/zk_log.Tpo" ".deps/zk_log.Plo"; else rm -f ".deps/zk_log.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_log.lo -MD -MP -MF .deps/zk_log.Tpo -c src/zk_log.c -fPIC -DPIC -o .libs/zk_log.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_log.lo -MD -MP -MF .deps/zk_log.Tpo -c src/zk_log.c -o zk_log.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_hashtable.lo -MD -MP -MF ".deps/zk_hashtable.Tpo" -c -o zk_hashtable.lo `test -f 'src/zk_hashtable.c' || echo './'`src/zk_hashtable.c; \
then mv -f ".deps/zk_hashtable.Tpo" ".deps/zk_hashtable.Plo"; else rm -f ".deps/zk_hashtable.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_hashtable.lo -MD -MP -MF .deps/zk_hashtable.Tpo -c src/zk_hashtable.c -fPIC -DPIC -o .libs/zk_hashtable.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_hashtable.lo -MD -MP -MF .deps/zk_hashtable.Tpo -c src/zk_hashtable.c -o zk_hashtable.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT st_adaptor.lo -MD -MP -MF ".deps/st_adaptor.Tpo" -c -o st_adaptor.lo `test -f 'src/st_adaptor.c' || echo './'`src/st_adaptor.c; \
then mv -f ".deps/st_adaptor.Tpo" ".deps/st_adaptor.Plo"; else rm -f ".deps/st_adaptor.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT st_adaptor.lo -MD -MP -MF .deps/st_adaptor.Tpo -c src/st_adaptor.c -fPIC -DPIC -o .libs/st_adaptor.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT st_adaptor.lo -MD -MP -MF .deps/st_adaptor.Tpo -c src/st_adaptor.c -o st_adaptor.o >/dev/null 2>&1
/bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libzkst.la zookeeper.lo recordio.lo zookeeper.jute.lo zk_log.lo zk_hashtable.lo st_adaptor.lo -lm
ar cru .libs/libzkst.a .libs/zookeeper.o .libs/recordio.o .libs/zookeeper.jute.o .libs/zk_log.o .libs/zk_hashtable.o .libs/st_adaptor.o
ranlib .libs/libzkst.a
creating libzkst.la
(cd .libs && rm -f libzkst.la && ln -s ../libzkst.la libzkst.la)
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable_itr.lo -MD -MP -MF ".deps/hashtable_itr.Tpo" -c -o hashtable_itr.lo `test -f 'src/hashtable/hashtable_itr.c' || echo './'`src/hashtable/hashtable_itr.c; \
then mv -f ".deps/hashtable_itr.Tpo" ".deps/hashtable_itr.Plo"; else rm -f ".deps/hashtable_itr.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable_itr.lo -MD -MP -MF .deps/hashtable_itr.Tpo -c src/hashtable/hashtable_itr.c -fPIC -DPIC -o .libs/hashtable_itr.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable_itr.lo -MD -MP -MF .deps/hashtable_itr.Tpo -c src/hashtable/hashtable_itr.c -o hashtable_itr.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable.lo -MD -MP -MF ".deps/hashtable.Tpo" -c -o hashtable.lo `test -f 'src/hashtable/hashtable.c' || echo './'`src/hashtable/hashtable.c; \
then mv -f ".deps/hashtable.Tpo" ".deps/hashtable.Plo"; else rm -f ".deps/hashtable.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable.lo -MD -MP -MF .deps/hashtable.Tpo -c src/hashtable/hashtable.c -fPIC -DPIC -o .libs/hashtable.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable.lo -MD -MP -MF .deps/hashtable.Tpo -c src/hashtable/hashtable.c -o hashtable.o >/dev/null 2>&1
/bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libhashtable.la hashtable_itr.lo hashtable.lo
ar cru .libs/libhashtable.a .libs/hashtable_itr.o .libs/hashtable.o
ranlib .libs/libhashtable.a
creating libhashtable.la
(cd .libs && rm -f libhashtable.la && ln -s ../libhashtable.la libhashtable.la)
/bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libzookeeper_st.la -rpath /usr/local/lib -no-undefined -version-info 2 -export-symbols-regex '(zoo_|zookeeper_|zhandle|Z|format_log_message|log_message|logLevel|deallocate_|zerror|is_unrecoverable)' libzkst.la libhashtable.la
generating symbol list for `libzookeeper_st.la'
/usr/bin/nm -B ./.libs/libzkst.a ./.libs/libhashtable.a | sed -n -e 's/^.*[ ]\([ABCDGIRSTW][ABCDGIRSTW]*\)[ ][ ]*\([_A-Za-z][_A-Za-z0-9]*\)$/\1 \2 \2/p' | /bin/sed 's/.* //' | sort | uniq > .libs/libzookeeper_st.exp
grep -E -e "(zoo_|zookeeper_|zhandle|Z|format_log_message|log_message|logLevel|deallocate_|zerror|is_unrecoverable)" ".libs/libzookeeper_st.exp" > ".libs/libzookeeper_st.expT"
mv -f ".libs/libzookeeper_st.expT" ".libs/libzookeeper_st.exp"
echo "{ global:" > .libs/libzookeeper_st.ver
cat .libs/libzookeeper_st.exp | sed -e "s/\(.*\)/\1;/" >> .libs/libzookeeper_st.ver
echo "local: *; };" >> .libs/libzookeeper_st.ver
gcc -shared -Wl,--whole-archive ./.libs/libzkst.a ./.libs/libhashtable.a -Wl,--no-whole-archive -lm -Wl,-soname -Wl,libzookeeper_st.so.2 -Wl,-version-script -Wl,.libs/libzookeeper_st.ver -o .libs/libzookeeper_st.so.2.0.0
(cd .libs && rm -f libzookeeper_st.so.2 && ln -s libzookeeper_st.so.2.0.0 libzookeeper_st.so.2)
(cd .libs && rm -f libzookeeper_st.so && ln -s libzookeeper_st.so.2.0.0 libzookeeper_st.so)
rm -fr .libs/libzookeeper_st.lax
mkdir .libs/libzookeeper_st.lax
rm -fr .libs/libzookeeper_st.lax/libzkst.a
mkdir .libs/libzookeeper_st.lax/libzkst.a
(cd .libs/libzookeeper_st.lax/libzkst.a && ar x /home/jlekstan/zookeeper-3.4.1/src/c/./.libs/libzkst.a)
rm -fr .libs/libzookeeper_st.lax/libhashtable.a
mkdir .libs/libzookeeper_st.lax/libhashtable.a
(cd .libs/libzookeeper_st.lax/libhashtable.a && ar x /home/jlekstan/zookeeper-3.4.1/src/c/./.libs/libhashtable.a)
ar cru .libs/libzookeeper_st.a .libs/libzookeeper_st.lax/libzkst.a/zookeeper.o .libs/libzookeeper_st.lax/libzkst.a/st_adaptor.o .libs/libzookeeper_st.lax/libzkst.a/recordio.o .libs/libzookeeper_st.lax/libzkst.a/zk_hashtable.o .libs/libzookeeper_st.lax/libzkst.a/zk_log.o .libs/libzookeeper_st.lax/libzkst.a/zookeeper.jute.o .libs/libzookeeper_st.lax/libhashtable.a/hashtable_itr.o .libs/libzookeeper_st.lax/libhashtable.a/hashtable.o
ranlib .libs/libzookeeper_st.a
rm -fr .libs/libzookeeper_st.lax
creating libzookeeper_st.la
(cd .libs && rm -f libzookeeper_st.la && ln -s ../libzookeeper_st.la libzookeeper_st.la)
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.lo -MD -MP -MF ".deps/libzkmt_la-zookeeper.Tpo" -c -o libzkmt_la-zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c; \
then mv -f ".deps/libzkmt_la-zookeeper.Tpo" ".deps/libzkmt_la-zookeeper.Plo"; else rm -f ".deps/libzkmt_la-zookeeper.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.lo -MD -MP -MF .deps/libzkmt_la-zookeeper.Tpo -c src/zookeeper.c -fPIC -DPIC -o .libs/libzkmt_la-zookeeper.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.lo -MD -MP -MF .deps/libzkmt_la-zookeeper.Tpo -c src/zookeeper.c -o libzkmt_la-zookeeper.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-recordio.lo -MD -MP -MF ".deps/libzkmt_la-recordio.Tpo" -c -o libzkmt_la-recordio.lo `test -f 'src/recordio.c' || echo './'`src/recordio.c; \
then mv -f ".deps/libzkmt_la-recordio.Tpo" ".deps/libzkmt_la-recordio.Plo"; else rm -f ".deps/libzkmt_la-recordio.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-recordio.lo -MD -MP -MF .deps/libzkmt_la-recordio.Tpo -c src/recordio.c -fPIC -DPIC -o .libs/libzkmt_la-recordio.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-recordio.lo -MD -MP -MF .deps/libzkmt_la-recordio.Tpo -c src/recordio.c -o libzkmt_la-recordio.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.jute.lo -MD -MP -MF ".deps/libzkmt_la-zookeeper.jute.Tpo" -c -o libzkmt_la-zookeeper.jute.lo `test -f 'generated/zookeeper.jute.c' || echo './'`generated/zookeeper.jute.c; \
then mv -f ".deps/libzkmt_la-zookeeper.jute.Tpo" ".deps/libzkmt_la-zookeeper.jute.Plo"; else rm -f ".deps/libzkmt_la-zookeeper.jute.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.jute.lo -MD -MP -MF .deps/libzkmt_la-zookeeper.jute.Tpo -c generated/zookeeper.jute.c -fPIC -DPIC -o .libs/libzkmt_la-zookeeper.jute.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.jute.lo -MD -MP -MF .deps/libzkmt_la-zookeeper.jute.Tpo -c generated/zookeeper.jute.c -o libzkmt_la-zookeeper.jute.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_log.lo -MD -MP -MF ".deps/libzkmt_la-zk_log.Tpo" -c -o libzkmt_la-zk_log.lo `test -f 'src/zk_log.c' || echo './'`src/zk_log.c; \
then mv -f ".deps/libzkmt_la-zk_log.Tpo" ".deps/libzkmt_la-zk_log.Plo"; else rm -f ".deps/libzkmt_la-zk_log.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_log.lo -MD -MP -MF .deps/libzkmt_la-zk_log.Tpo -c src/zk_log.c -fPIC -DPIC -o .libs/libzkmt_la-zk_log.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_log.lo -MD -MP -MF .deps/libzkmt_la-zk_log.Tpo -c src/zk_log.c -o libzkmt_la-zk_log.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_hashtable.lo -MD -MP -MF ".deps/libzkmt_la-zk_hashtable.Tpo" -c -o libzkmt_la-zk_hashtable.lo `test -f 'src/zk_hashtable.c' || echo './'`src/zk_hashtable.c; \
then mv -f ".deps/libzkmt_la-zk_hashtable.Tpo" ".deps/libzkmt_la-zk_hashtable.Plo"; else rm -f ".deps/libzkmt_la-zk_hashtable.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_hashtable.lo -MD -MP -MF .deps/libzkmt_la-zk_hashtable.Tpo -c src/zk_hashtable.c -fPIC -DPIC -o .libs/libzkmt_la-zk_hashtable.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_hashtable.lo -MD -MP -MF .deps/libzkmt_la-zk_hashtable.Tpo -c src/zk_hashtable.c -o libzkmt_la-zk_hashtable.o >/dev/null 2>&1
if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-mt_adaptor.lo -MD -MP -MF ".deps/libzkmt_la-mt_adaptor.Tpo" -c -o libzkmt_la-mt_adaptor.lo `test -f 'src/mt_adaptor.c' || echo './'`src/mt_adaptor.c; \
then mv -f ".deps/libzkmt_la-mt_adaptor.Tpo" ".deps/libzkmt_la-mt_adaptor.Plo"; else rm -f ".deps/libzkmt_la-mt_adaptor.Tpo"; exit 1; fi
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-mt_adaptor.lo -MD -MP -MF .deps/libzkmt_la-mt_adaptor.Tpo -c src/mt_adaptor.c -fPIC -DPIC -o .libs/libzkmt_la-mt_adaptor.o
gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-mt_adaptor.lo -MD -MP -MF .deps/libzkmt_la-mt_adaptor.Tpo -c src/mt_adaptor.c -o libzkmt_la-mt_adaptor.o >/dev/null 2>&1
/bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libzkmt.la libzkmt_la-zookeeper.lo libzkmt_la-recordio.lo libzkmt_la-zookeeper.jute.lo libzkmt_la-zk_log.lo libzkmt_la-zk_hashtable.lo libzkmt_la-mt_adaptor.lo -lm
ar cru .libs/libzkmt.a .libs/libzkmt_la-zookeeper.o .libs/libzkmt_la-recordio.o .libs/libzkmt_la-zookeeper.jute.o .libs/libzkmt_la-zk_log.o .libs/libzkmt_la-zk_hashtable.o .libs/libzkmt_la-mt_adaptor.o
ranlib .libs/libzkmt.a
creating libzkmt.la
(cd .libs && rm -f libzkmt.la && ln -s ../libzkmt.la libzkmt.la)
/bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libzookeeper_mt.la -rpath /usr/local/lib -no-undefined -version-info 2 -export-symbols-regex '(zoo_|zookeeper_|zhandle|Z|format_log_message|log_message|logLevel|deallocate_|zerror|is_unrecoverable)' libzkmt.la libhashtable.la -lpthread
generating symbol list for `libzookeeper_mt.la'
/usr/bin/nm -B ./.libs/libzkmt.a ./.libs/libhashtable.a | sed -n -e 's/^.*[ ]\([ABCDGIRSTW][ABCDGIRSTW]*\)[ ][ ]*\([_A-Za-z][_A-Za-z0-9]*\)$/\1 \2 \2/p' | /bin/sed 's/.* //' | sort | uniq > .libs/libzookeeper_mt.exp
grep -E -e "(zoo_|zookeeper_|zhandle|Z|format_log_message|log_message|logLevel|deallocate_|zerror|is_unrecoverable)" ".libs/libzookeeper_mt.exp" > ".libs/libzookeeper_mt.expT"
mv -f ".libs/libzookeeper_mt.expT" ".libs/libzookeeper_mt.exp"
echo "{ global:" > .libs/libzookeeper_mt.ver
cat .libs/libzookeeper_mt.exp | sed -e "s/\(.*\)/\1;/" >> .libs/libzookeeper_mt.ver
echo "local: *; };" >> .libs/libzookeeper_mt.ver
gcc -shared -Wl,--whole-archive ./.libs/libzkmt.a ./.libs/libhashtable.a -Wl,--no-whole-archive -lm -lpthread -Wl,-soname -Wl,libzookeeper_mt.so.2 -Wl,-version-script -Wl,.libs/libzookeeper_mt.ver -o .libs/libzookeeper_mt.so.2.0.0
(cd .libs && rm -f libzookeeper_mt.so.2 && ln -s libzookeeper_mt.so.2.0.0 libzookeeper_mt.so.2)
(cd .libs && rm -f libzookeeper_mt.so && ln -s libzookeeper_mt.so.2.0.0 libzookeeper_mt.so)
rm -fr .libs/libzookeeper_mt.lax
mkdir .libs/libzookeeper_mt.lax
rm -fr .libs/libzookeeper_mt.lax/libzkmt.a
mkdir .libs/libzookeeper_mt.lax/libzkmt.a
(cd .libs/libzookeeper_mt.lax/libzkmt.a && ar x /home/jlekstan/zookeeper-3.4.1/src/c/./.libs/libzkmt.a)
rm -fr .libs/libzookeeper_mt.lax/libhashtable.a
mkdir .libs/libzookeeper_mt.lax/libhashtable.a
(cd .libs/libzookeeper_mt.lax/libhashtable.a && ar x /home/jlekstan/zookeeper-3.4.1/src/c/./.libs/libhashtable.a)
ar cru .libs/libzookeeper_mt.a .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-zk_hashtable.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-zookeeper.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-zk_log.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-zookeeper.jute.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-recordio.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-mt_adaptor.o .libs/libzookeeper_mt.lax/libhashtable.a/hashtable_itr.o .libs/libzookeeper_mt.lax/libhashtable.a/hashtable.o
ranlib .libs/libzookeeper_mt.a
rm -fr .libs/libzookeeper_mt.lax
creating libzookeeper_mt.la
(cd .libs && rm -f libzookeeper_mt.la && ln -s ../libzookeeper_mt.la libzookeeper_mt.la)
if gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT cli.o -MD -MP -MF ".deps/cli.Tpo" -c -o cli.o `test -f 'src/cli.c' || echo './'`src/cli.c; \
then mv -f ".deps/cli.Tpo" ".deps/cli.Po"; else rm -f ".deps/cli.Tpo"; exit 1; fi
/bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o cli_st cli.o libzookeeper_st.la
gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o .libs/cli_st cli.o ./.libs/libzookeeper_st.so -lm
./.libs/libzookeeper_st.so: undefined reference to `hashtable_iterator_value'
./.libs/libzookeeper_st.so: undefined reference to `hashtable_iterator_key'
collect2: ld returned 1 exit status
make[1]: *** [cli_st] Error 1
make[1]: Leaving directory `/home/jlekstan/zookeeper-3.4.1/src/c'
make: *** [all] Error 2
{code}
221824 No Perforce job exists for this issue. 2 32609
7 years, 46 weeks, 2 days ago 0|i05y0f:
ZooKeeper ZOOKEEPER-1338

class cast exceptions may be thrown by multi ErrorResult class (invalid equals)

Bug Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 21/Dec/11 19:30   06/Feb/12 05:58 06/Feb/12 05:16 3.4.0 3.4.3, 3.5.0 java client   0 1   There's a bug in ErrorResult and perhaps some of the other OpResult equals methods in multi. 221789 No Perforce job exists for this issue. 3 32610
8 years, 7 weeks, 3 days ago 0|i05y0n:
ZooKeeper ZOOKEEPER-1337

multi's "Transaction" class is missing tests.

Test Resolved Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 21/Dec/11 19:19   06/Feb/12 05:58 06/Feb/12 05:00 3.4.0 3.4.3, 3.5.0 java client   0 0   Add tests for zookeeper client transaction() method. 221787 No Perforce job exists for this issue. 2 33288
8 years, 7 weeks, 3 days ago
Reviewed
0|i0627b:
ZooKeeper ZOOKEEPER-1336

javadoc for multi is confusing, references functionality that doesn't seem to exist

Bug Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 21/Dec/11 18:24   06/Feb/12 05:58 06/Feb/12 03:58 3.4.1 3.4.3, 3.5.0 java client   0 1   There's this in org.apache.zookeeper.ZooKeeper.multi(Iterable<Op>)

{noformat}
* Executes multiple Zookeeper operations or none of them. On success, a list of results is returned.
* On failure, only a single exception is returned. If you want more details, it may be preferable to
* use the alternative form of this method that lets you pass a list into which individual results are
* placed so that you can zero in on exactly which operation failed and why.
{noformat}

What is the "alternate form of this method" that's being referred to? Seems like we should add this functionality, or at the very least update the javadoc. (I don't think this is referring to Transaction, although the docs there are pretty thin)

221782 No Perforce job exists for this issue. 1 32611
8 years, 7 weeks, 3 days ago Improved Javadoc for multi API's.
Reviewed
0|i05y0v:
ZooKeeper ZOOKEEPER-1335

Add support for --config to zkEnv.sh to specify a config directory different than what is expected

Improvement Resolved Major Fixed Arpit Gupta Arpit Gupta Arpit Gupta 20/Dec/11 16:56   17/Dec/12 06:04 17/Dec/12 01:11   3.5.0     0 2   zkEnv.sh expects ZOOCFGDIR env variable set. If not it looks for the conf dir in the ZOOKEEPER_PREFIX dir or in /etc/zookeeper. It would be great if we can support --config option where at run time you could specify a different config directory. We do the same thing in hadoop.

With this you should be able to do

/usr/sbin/zkServer.sh --config /some/conf/dir start|stop
221592 No Perforce job exists for this issue. 2 41992
7 years, 14 weeks, 3 days ago 0|i07jwn:
ZooKeeper ZOOKEEPER-1334

Zookeeper 3.4.x is not OSGi compliant - MANIFEST.MF is flawed

Bug Closed Major Fixed Claus Ibsen Claus Ibsen Claus Ibsen 20/Dec/11 11:31   08/Oct/14 11:55 19/Dec/12 03:05 3.4.0 3.4.6, 3.5.0     3 12   In Zookeeper 3.3.x you use log4j for logging, and the maven dep is

eg from 3.3.4
{code}
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.15</version>
<scope>compile</scope>
</dependency>
{code}

Now in 3.4.0 or better you changed to use slf4j also/instead. The maven pom.xml now includes:
{code}
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.6.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.6.1</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.15</version>
<scope>compile</scope>
</dependency>
{code}

But the META-INF/MANIFEST.MF file in the distribution did not change to reflect this.

The 3.3.4 MANIFEST.MF, import packages
{code}
Import-Package: javax.management,org.apache.log4j,org.osgi.framework;v
ersion="[1.4,2.0)",org.osgi.util.tracker;version="[1.1,2.0)"
{code}

And the 3.4.1 MANIFEST.MF, import packages:
{code}
Import-Package: javax.management,org.apache.log4j,org.osgi.framework;v
ersion="[1.4,2.0)",org.osgi.util.tracker;version="[1.1,2.0)"
{code}

This makes using zookeeper 3.4.x in OSGi environments not possible, as we get NoClassDefFoundException for slf4j classes.
221549 No Perforce job exists for this issue. 3 32612 6 years, 2 weeks ago
Reviewed
0|i05y13:
ZooKeeper ZOOKEEPER-1333

NPE in FileTxnSnapLog when restarting a cluster

Bug Closed Blocker Fixed Patrick D. Hunt Andrew McNair Andrew McNair 19/Dec/11 20:24   29/Dec/11 18:46 21/Dec/11 15:40 3.4.0 3.4.2, 3.5.0 server   0 7   I think a NPE was created in the fix for https://issues.apache.org/jira/browse/ZOOKEEPER-1269

Looking in DataTree.processTxn(TxnHeader header, Record txn) it seems likely that if rc.err != Code.OK then rc.path will be null.

I'm currently working on a minimal test case for the bug, I'll attach it to this issue when it's ready.

java.lang.NullPointerException
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:203)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:150)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:418)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:410)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)


221456 No Perforce job exists for this issue. 7 32613
8 years, 14 weeks ago
Reviewed
0|i05y1b:
ZooKeeper ZOOKEEPER-1332

Zookeeper data is not in sync with quorum in the mentioned scenario

Bug Resolved Major Duplicate Unassigned amith amith 19/Dec/11 04:32   19/Dec/11 07:03 19/Dec/11 07:03 3.4.0 3.4.1 server   0 0   3 zookeeper quorum Please check the below mentioned scenario:-

1. Configure 3 zookeeper servers in quorum
2. Start zk1 (F) and zk2(L) from a java client create a node(client connect to zk2)
3. Stop the zk2 (L)
4. Start the zk3, Now FLE is successful but zookeeper-3 is not having the node created

In step 4 Zookeeper-3 is getting a diff from the leader

2011-12-19 20:15:59,379 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Environment@98] - Server environment:user.home=/root
2011-12-19 20:15:59,379 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Environment@98] - Server environment:user.dir=/home/amith/OpenSrc/zookeeper/zookeeper3/bin
2011-12-19 20:15:59,381 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:ZooKeeperServer@168] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ../dataDir/version-2 snapdir ../dataDir/version-2
2011-12-19 20:15:59,382 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 102
2011-12-19 20:15:59,403 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Learner@322] - Getting a diff from the leader 0x10000000a
2011-12-19 20:15:59,449 [myid:3] - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Learner@372] - Got zxid 0x10000000a expected 0x1
2011-12-19 20:15:59,450 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:FileTxnSnapLog@255] - Snapshotting: 10000000a

but in the diff all the required data is not obtained ...!

Here I think zookeeper-3 should get snapshot from leader and not Diff
221357 No Perforce job exists for this issue. 0 32614
8 years, 14 weeks, 3 days ago 0|i05y1j:
ZooKeeper ZOOKEEPER-1331

Typo in docs: acheive -> achieve

Bug Resolved Minor Fixed Andrew Ash Andrew Ash Andrew Ash 19/Dec/11 02:10   28/Dec/11 05:58 27/Dec/11 19:39 3.2.2 3.5.0 documentation   0 1   Found this typo while reading docs. Attaching SVN patch 221343 No Perforce job exists for this issue. 3 32615
8 years, 13 weeks, 1 day ago
Reviewed
0|i05y1r:
ZooKeeper ZOOKEEPER-1330

Zookeeper server not serving the client request even after completion of Leader election

Bug Open Minor Unresolved Unassigned amith amith 19/Dec/11 00:21   05/Feb/20 07:17   3.4.0 3.7.0, 3.5.8 server   0 8   3 zk quorum Have a cluster of 3 zookeepers
90 clients are connected to the server
leader got killed and started
the other 2 zookeeper started FLE and Leader was elected

But its taking nearly 10 sec for this server to server requests and saying "ZooKeeperServer not running" message..?

Why is this even after Leader election SERVER IS NOT RUNNING !!!!!!!!!!

2011-12-19 16:12:29,732 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2011-12-19 16:12:29,733 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1000] - Closed socket connection for client /10.18.47.148:51965 (no session established for client)
2011-12-19 16:12:29,753 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:QuorumPeer@747] - LEADING
2011-12-19 16:12:29,762 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:Leader@58] - TCP NoDelay set to: true
2011-12-19 16:12:29,765 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:ZooKeeperServer@168] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ../dataDir/version-2 snapdir ../dataDir/version-2
2011-12-19 16:12:29,766 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:Leader@294] - LEADING - LEADER ELECTION TOOK - 4663
2011-12-19 16:12:29,776 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:FileSnap@83] - Reading snapshot ../dataDir/version-2/snapshot.100013661
2011-12-19 16:12:29,831 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@213] - Accepted socket connection from /10.18.47.148:51982
2011-12-19 16:12:29,831 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2011-12-19 16:12:29,832 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1000] - Closed socket connection for client /10.18.47.148:51982 (no session established for client)
2011-12-19 16:12:29,884 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@213] - Accepted socket connection from /10.18.47.148:51989
2011-12-19 16:12:29,884 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
221339 No Perforce job exists for this issue. 0 32616
22 weeks, 3 days ago 0|i05y1z:
ZooKeeper ZOOKEEPER-1329

Lock recipe sorts sequenced children incorrectly

Bug Open Major Unresolved Unassigned Evan McClure Evan McClure 15/Dec/11 19:04   15/Dec/11 19:05   3.3.3   recipes   1 2   Mac OS X Version 10.6.8
Darwin emcclure-lt-mac.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
Homebrew 0.8
The lock recipe sorts sequenced children lexicographically. When the sequence number wraps, a lexicographical comparison will always place 2147483648 ahead of -2147483649, place -2147483648 ahead of -2147483649, and place -1 ahead of -2. Clearly, we want 2147483648 < -2147483649, -2147483649 < -2147483648, and -2 placed ahead of -1, since those sequence numbers were generated in that order.

I suggest that the sequence numbers be converted to unsigned numbers before being compared in the comparison functor that gets passed to qsort().

This leaves us with another issue. When comparing unsigned sequence numbers, there is a slim chance that 4294967296 < 0. So, I suggest that a fudge range be used, say, the number of nodes in the quorum * some fudge factor, in order to handle this comparison.

Please close this if I'm way off base here.
221077 No Perforce job exists for this issue. 0 32617
8 years, 15 weeks ago 0|i05y27:
ZooKeeper ZOOKEEPER-1328

Misplaced assertion for the test case 'FLELostMessageTest' and not identifying misfunctions

Test Resolved Major Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 13/Dec/11 00:46   03/Sep/12 07:01 03/Sep/12 01:59 3.4.0 3.5.0 leaderElection   0 5   Assertion for testLostMessage is kept inside the thread.run() method. Due to this the assertion failure will not be reflected in the main testcase.
I have observed the test case is still passing in case of the assert failure or misfunction. Instead, the assertion can be moved to the test case - testLostMessage.

{noformat}
class LEThread extends Thread {
public void run(){
peer.setCurrentVote(v);
LOG.info("Finished election: " + i + ", " + v.getId());
Assert.assertTrue("State is not leading.", peer.getPeerState() == ServerState.LEADING);
}
{noformat}
220589 No Perforce job exists for this issue. 3 33289
7 years, 29 weeks, 3 days ago
Reviewed
0|i0627j:
ZooKeeper ZOOKEEPER-1327

there are still remnants of hadoop urls

Bug Resolved Major Fixed Harsh J Benjamin Reed Benjamin Reed 12/Dec/11 23:25   06/Feb/12 05:58 06/Feb/12 04:17   3.4.3, 3.5.0     0 2   there are still hadoop urls and references to zookeeper lists under the hadoop project in the sources. 220587 No Perforce job exists for this issue. 3 32618
8 years, 7 weeks, 3 days ago Remove links to Hadoop wiki's in ZooKeeper docs.
Reviewed
0|i05y2f:
ZooKeeper ZOOKEEPER-1326

The CLI commands "delete" and "rmr" are confusing. Can we have "delete" + "deleteall" instead?

Wish Resolved Trivial Fixed Harsh J Harsh J Harsh J 11/Dec/11 14:56   28/Dec/11 11:13 27/Dec/11 18:53 3.4.0 3.5.0 java client   0 2   ZOOKEEPER-729 introduced 'rmr' for recursive 'delete' operations on a given node. Going by the unix convention, wouldn't it be much better if we were to have an 'rm' if there was an 'rmr' added?

The current set is confusing. Or should we have 'delete' and 'deleteall' or summat?

I know this is a nitpick, but I just dislike to see bad keywords used for commands.

I'm OK to produce a backwards-compatible patch if this is acceptable.
220398 No Perforce job exists for this issue. 2 33290
8 years, 13 weeks, 1 day ago
Reviewed
0|i0627r:
ZooKeeper ZOOKEEPER-1325

Log maxClientCnxn warning in INFO level

Improvement Resolved Minor Invalid Unassigned Mubarak Seyed Mubarak Seyed 09/Dec/11 17:09   09/Dec/11 18:36 09/Dec/11 17:30 3.3.3, 3.3.4, 3.4.0   server   0 0   When Hbase client ZooKeeperWatcher gets ConnectionLossException (/hbase/rs or /hbase), it is very hard debug the ZK server log if ZK server has started using log4j INFO level.
When maxClientCnxn limit is reached for a single client (at the socket level), it will be nice to log it in INFO level instead of WARN. It will help hbase clients (Region server, HMaster, and HBase cient lib) to debug the issue in production.

{code}

3.4 - src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java
3.3.4 - src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java

public void run() {
while (!ss.socket().isClosed()) {
try {
...
...

if (maxClientCnxns > 0 && cnxncount >= maxClientCnxns){
LOG.info("Too many connections from " + ia
+ " - max is " + maxClientCnxns );
sc.close();
}
...
}

{code}
noob 220275 No Perforce job exists for this issue. 0 33291
8 years, 15 weeks, 6 days ago 0|i0627z:
ZooKeeper ZOOKEEPER-1324

Remove Duplicate NEWLEADER packets from the Leader to the Follower.

Improvement Closed Critical Fixed Flavio Paiva Junqueira Mahadev Konar Mahadev Konar 09/Dec/11 13:39   13/Mar/14 14:17 14/May/13 08:29 3.5.0 3.4.6, 3.5.0 quorum   0 10   220241 No Perforce job exists for this issue. 11 32619
6 years, 2 weeks ago 0|i05y2n:
ZooKeeper ZOOKEEPER-1323

c client doesn't compile on freebsd

Bug Closed Major Fixed Michi Mutsuzaki Michi Mutsuzaki Michi Mutsuzaki 08/Dec/11 20:28   29/Dec/11 18:46 14/Dec/11 18:19 3.4.0 3.4.2, 3.5.0 c client   0 1   freebsd 6.4 EAI_NODATA and EAI_ADDRFAMILY have been deprecated in FreeBSD. I'm getting this error:

src/zookeeper.c: In function `getaddrinfo_errno':
src/zookeeper.c:446: error: `EAI_NODATA' undeclared (first use in this function)
src/zookeeper.c:446: error: (Each undeclared identifier is reported only once
src/zookeeper.c:446: error: for each function it appears in.)
src/zookeeper.c: In function `getaddrs':
src/zookeeper.c:581: error: `EAI_ADDRFAMILY' undeclared (first use in this function)

I'll submit a patch.

--Michi
220141 No Perforce job exists for this issue. 1 32620
8 years, 15 weeks ago
Reviewed
0|i05y2v:
ZooKeeper ZOOKEEPER-1322

Cleanup/fix logging in Quorum code.

Improvement Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 07/Dec/11 19:47   06/Feb/12 05:58 06/Feb/12 03:44 3.4.0, 3.5.0 3.4.3, 3.5.0 server   0 0   While triaging ZOOKEEPER-1319 I updated the code with the attached patch in order to help debug what was going on with that issue. I think it would be useful to include these changes in the project itself. ff to include in 3.4.1 or push to 3.5.0.

You should verify this with TRACE logging turned on in addition to INFO (default).
220004 No Perforce job exists for this issue. 2 33292
8 years, 7 weeks, 3 days ago Improved logging in Quorum Code.
Reviewed
0|i06287:
ZooKeeper ZOOKEEPER-1321

Add number of client connections metric in JMX and srvr

Improvement Resolved Major Fixed Neha Narkhede Neha Narkhede Neha Narkhede 07/Dec/11 11:19   10/Feb/12 19:16 10/Feb/12 19:16 3.3.4, 3.4.2 3.4.4, 3.5.0     0 4   The related conversation on the zookeeper user mailing list is here - http://apache.markmail.org/message/4jjcmooniowwugu2?q=+list:org.apache.hadoop.zookeeper-user

It is useful to be able to monitor the number of disconnect operations on a client. This is generally indicative of a client going through large number of GC and hence disconnecting way too often from a zookeeper cluster.

Today, this information is only indirectly exposed as part of the stat command which requires counting the results. That's alot of work for the server to do just to get connection count.

For monitoring purposes, it will be useful to have this exposed through JMX and 4lw srvr.
patch 219931 No Perforce job exists for this issue. 8 12496
8 years, 6 weeks, 6 days ago 0|i02hvb:
ZooKeeper ZOOKEEPER-1320

Add the feature to zookeeper allow client limitations by ip.

New Feature Resolved Major Incomplete Leader Ni Leader Ni Leader Ni 06/Dec/11 05:43   18/Mar/12 02:28 18/Mar/12 02:28 3.3.3   server   0 0 604800 604800 0% Linux version 2.6.18-164.el5 (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)), jdk-1.6.0_17 Add the feature to zookeeper so that administrator can set the list of ips that limit which nodes can connect to the zk servers and which connected clients can operate on data. 0% 0% 604800 604800 client,server,limited,ipfilter 219737 No Perforce job exists for this issue. 4 33293
8 years, 1 week, 5 days ago Add the feature to zookeeper so that administrator can set the list of ips that limit which nodes can connect to the zk servers and which connected clients can operate on data.
0|i0628f:
ZooKeeper ZOOKEEPER-1319

Missing data after restarting+expanding a cluster

Bug Closed Blocker Fixed Patrick D. Hunt Jeremy Stribling Jeremy Stribling 05/Dec/11 22:06   16/Dec/11 20:33 09/Dec/11 14:09 3.4.0 3.4.1, 3.5.0     0 4   Linux (Debian Squeeze) I've been trying to update to ZK 3.4.0 and have had some issues where some data become inaccessible after adding a node to a cluster. My use case is a bit strange (as explained before on this list) in that I try to grow the cluster dynamically by having an external program automatically restart Zookeeper servers in a controlled way whenever the list of participating ZK servers needs to change. This used to work just fine in 3.3.3 (and before), so this represents a regression.

The scenario I see is this:

1) Start up a 1-server ZK cluster (the server has ZK ID 0).
2) A client connects to the server, and makes a bunch of znodes, in particular a znode called "/membership".
3) Shut down the cluster.
4) Bring up a 2-server ZK cluster, including the original server 0 with its existing data, and a new server with ZK ID 1.
5) Node 0 has the highest zxid and is elected leader.
6) A client connecting to server 1 tries to "get /membership" and gets back a -101 error code (no such znode).
7) The same client then tries to "create /membership" and gets back a -110 error code (znode already exists).
8) Clients connecting to server 0 can successfully "get /membership".

I will attach a tarball with debug logs for both servers, annotating where steps #1 and #4 happen. You can see that the election involves a proposal for zxid 110 from server 0, but immediately following the election server 1 has these lines:

2011-12-05 17:18:48,308 9299 [QuorumPeer[myid=1]/127.0.0.1:2901] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x100000001 expected 0x1
2011-12-05 17:18:48,313 9304 [SyncThread:1] INFO org.apache.zookeeper.server.persistence.FileTxnLog - Creating new log file: log.100000001

Perhaps that's not relevant, but it struck me as odd. At the end of server 1's log you can see a repeated cycle of getData->create->getData as the client tries to make sense of the inconsistent responses.

The other piece of information is that if I try to use the on-disk directories for either of the servers to start a new one-node ZK cluster, all the data are accessible.

I haven't tried writing a program outside of my application to reproduce this, but I can do it very easily with some of my app's tests if anyone needs more information.
cluster, data 219708 No Perforce job exists for this issue. 5 32621
8 years, 15 weeks, 6 days ago
Reviewed
0|i05y33:
ZooKeeper ZOOKEEPER-1318

In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly

Bug Resolved Major Fixed Henry Robinson Jim Fulton Jim Fulton 04/Dec/11 14:06   11/May/12 07:00 09/May/12 21:54 3.3.3 3.3.6, 3.4.4, 3.5.0 contrib-bindings   1 5   Mac OS X (at least) In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly.


>>> zookeeper.state(h)
-112
>>> zookeeper.get_children(h, '/')
Traceback (most recent call last):
File "<console>", line 1, in <module>
SystemError: error return without exception set

Let me know if you'd like me to work on a patch.
219526 No Perforce job exists for this issue. 2 32622
7 years, 45 weeks, 6 days ago
Reviewed
0|i05y3b:
ZooKeeper ZOOKEEPER-1317

Possible segfault in zookeeper_init

Bug Closed Minor Fixed Akira Kitada Akira Kitada Akira Kitada 04/Dec/11 08:48   16/Dec/11 20:33 09/Dec/11 13:37 3.3.3, 3.4.0 3.4.1, 3.5.0 c client   0 1   zookeeper_init does not check the return value of strdup(index_chroot).
When it returns NULL, it causes segfault when it try to strlen(zh->chroot).
219508 No Perforce job exists for this issue. 1 32623
8 years, 15 weeks, 6 days ago
Reviewed
0|i05y3j:
ZooKeeper ZOOKEEPER-1316

zookeeper_init leaks memory if chroot is just '/'

Bug Closed Minor Fixed Akira Kitada Akira Kitada Akira Kitada 04/Dec/11 08:34   16/Dec/11 20:33 08/Dec/11 17:30 3.3.3, 3.4.0 3.4.1, 3.5.0 c client   0 1   zookeeper_init does not free strdup'ed memory when chroot is just '/'.
219507 No Perforce job exists for this issue. 1 32624
8 years, 15 weeks, 6 days ago
Reviewed
0|i05y3r:
ZooKeeper ZOOKEEPER-1315

zookeeper_init always reports sessionPasswd=<hidden>

Bug Closed Minor Fixed Akira Kitada Akira Kitada Akira Kitada 04/Dec/11 05:31   16/Dec/11 20:33 08/Dec/11 19:21 3.3.4, 3.4.0 3.4.1, 3.5.0 c client   0 1   zookeeper_init always reports sessionPasswd=<hidden> even when it's empty. 219502 No Perforce job exists for this issue. 1 32625
8 years, 15 weeks, 6 days ago
Reviewed
0|i05y3z:
ZooKeeper ZOOKEEPER-1314

improve zkpython synchronous api implementation

Improvement Open Minor Unresolved Daniel Lescohier Daniel Lescohier Daniel Lescohier 01/Dec/11 16:53   26/Jul/13 14:21   3.3.3   contrib-bindings   1 5 1800 1800 0% Improves the following items in zkpython which are related to the Zookeeper synchronous API:

# For pyzoo_create, no longer limit the returned znode name to 256 bytes; dynamically allocate memory on the heap.
# For all the synchronous api calls, release the Python Global Interpreter Lock just before doing the synchronous call.

I will attach the patch shortly.
0% 0% 1800 1800 219245 No Perforce job exists for this issue. 2 41993
6 years, 34 weeks, 6 days ago Improves zkpython synchronous api; release GIL before synchronous calls, and do not limit returned znode name to 256 bytes for synchronous create call.
0|i07jwv:
ZooKeeper ZOOKEEPER-1313

Expose/create KeeperException for "Packet len <x> is out of range!" error when jute max buffer size is exceeded

Bug Open Major Unresolved Unassigned Daniel Lord Daniel Lord 30/Nov/11 19:26   15/Sep/14 23:06           0 3   When a zookeeper client receives a Packet that is over the jute max buffer limit the behavior that is exposed to the callers of the zookeeper client is misleading. When the packet length exceeds the max size an IOException is thrown. This is caught and handled by the SendThread by cleaning up the current connection and enqueueing a Disconnected event. The immediate caller of zookeeper sees this as a ConnectionLossException with a Disconnected event on the main Watcher. This state transition is a bit misleading because under many circumstances as soon as the SyncConnected event is received retrying the same operation will succeed. However, in this case it is likely that the zookeeper client will reconnect immediately and if the operation is retried the same jute max buffer limit exception will be thrown which will trigger another disconnect and reconnect.

It would be great if the exception was exposed to the caller of the zookeeper client some how so that a more appropriate action can be taken. For instance, it might be appropriate to fail completely or to attempt to establish a new session.
219111 No Perforce job exists for this issue. 0 32626
5 years, 27 weeks, 2 days ago 0|i05y47:
ZooKeeper ZOOKEEPER-1312

Add a "getChildrenWithStat" operation

New Feature Open Major Unresolved Unassigned Daniel Lord Daniel Lord 30/Nov/11 18:46   02/Dec/11 19:54           0 1   It would be extremely useful to be able to have a "getChildrenWithStat" method. This method would behave exactly the same as getChildren but in addition to returning the list of all child znode names it would also return a Stat for each child. I'm sure there are quite a few use cases for this but it could save a lot of extra reads for my application. newbie 219102 No Perforce job exists for this issue. 0 41994
8 years, 16 weeks, 5 days ago 0|i07jx3:
ZooKeeper ZOOKEEPER-1311

ZooKeeper test jar is broken

Bug Closed Blocker Fixed Ivan Kelly Ivan Kelly Ivan Kelly 30/Nov/11 13:04   16/Dec/11 20:33 01/Dec/11 02:14 3.4.0 3.4.1, 3.5.0     0 1   In http://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.4.0/ the test jar cannot be accessed by maven.

There are two possible solutions to this.
a) rename zookeeper-3.4.0-test.jar to zookeeper-3.4.0-tests.jar and remove zookeeper-3.4.0-test.pom*
With this, the maven can access the test jar with

{code}
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>3.4.0</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
{code}

b) Alternatively, zookeeper test could be it's own submodule. To do this, it must be deployed in the following layout
{code}
./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.jar
./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.jar.md5
./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.jar.sha1
./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.pom
./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.pom.md5
./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.pom.sha1
{code}

This can then be accessed by maven with
{code}
<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper-test</artifactId>
<version>3.4.0</version>
<scope>test</scope>
</dependency>
{code}


I think a) is the better solution.
219050 No Perforce job exists for this issue. 1 32627
8 years, 17 weeks ago
Reviewed
0|i05y4f:
ZooKeeper ZOOKEEPER-1310

C Api should use state CONNECTION_LOSS

New Feature Open Major Unresolved Unassigned Jakub Lekstan Jakub Lekstan 30/Nov/11 07:47   06/May/12 00:51       c client   0 1   Linux I would like to ZooKeeper let know my watcher (which I'm giving to zookeeeper_init) about CONNECTION_LOSS, right the given watcher doesn't know that connection is lost due to what I can't do my stuff.

What you think? If so I could try to create a patch.
219010 No Perforce job exists for this issue. 0 41995
7 years, 46 weeks, 4 days ago 0|i07jxb:
ZooKeeper ZOOKEEPER-1309

Creating a new ZooKeeper client can leak file handles

Bug Resolved Critical Fixed Daniel Lord Daniel Lord Daniel Lord 28/Nov/11 21:03   26/Feb/12 19:32 26/Feb/12 19:32 3.3.4 3.3.5 java client   3 1   If there is an IOException thrown by the constructor of ClientCnxn then file handles are leaked because of the initialization of the Selector which is never closed.

final Selector selector = Selector.open();

If there is an abnormal exit from the constructor then the Selector is not closed and file handles are leaked. You can easily see this by setting the hosts string to garbage ("qwerty", "asdf", etc.) and then try to open a new ZooKeeper connection. I've observed the same behavior in production when there were DNS issues where the host names of the ensemble can no longer be resolved and the application servers quickly run out of handles attempting to (re)connect to zookeeper.
218779 No Perforce job exists for this issue. 4 32628
8 years, 4 weeks, 4 days ago 0|i05y4n:
ZooKeeper ZOOKEEPER-1308

Guaranteed NPE in WriteLock recipe

Bug Resolved Minor Invalid Unassigned Mark Miller Mark Miller 24/Nov/11 08:43   24/Nov/11 08:48 24/Nov/11 08:48     recipes   0 0   {code}
public boolean execute() throws KeeperException, InterruptedException {
do {
if (id == null) {
long sessionId = zookeeper.getSessionId();
String prefix = "x-" + sessionId + "-";
// lets try look up the current ID if we failed
// in the middle of creating the znode
findPrefixInChildren(prefix, zookeeper, dir);
idName = new ZNodeName(id);
}
{code}

ZNodeName will throw an NPE if id is null.
218348 No Perforce job exists for this issue. 0 32629
8 years, 18 weeks ago 0|i05y4v:
ZooKeeper ZOOKEEPER-1307

zkCli.sh is exiting when an Invalid ACL exception is thrown from setACL command through client

Bug Resolved Minor Fixed kavita sharma amith amith 23/Nov/11 04:07   23/Apr/12 13:17 16/Mar/12 20:42   3.4.4, 3.5.0 java client   0 3   zkCli.sh use consoleClient (zkCli.sh) and issue setAcl /temp abc
[zk: XX.XX.XX.XX:XXXX(CONNECTED) 17] setAcl /temp abc
abc does not have the form scheme:id:perm
Exception in thread "main" org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL
at org.apache.zookeeper.ZooKeeper.setACL(ZooKeeper.java:1172)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:582)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:354)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:312)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:271)
linux-xxx:/zookeeper1/bin #

if any InvalidACLException is thrown then client is exiting.
client should be able to handle this kind of issues

newbie 218174 No Perforce job exists for this issue. 1 32630
8 years, 1 week, 5 days ago
Reviewed
0|i05y53:
ZooKeeper ZOOKEEPER-1306

hang in zookeeper_close()

Bug Open Major Unresolved Michael Lee helei helei 19/Nov/11 02:02   29/Nov/11 16:52   3.3.3   c client   1 0   With patch ZOOKEEPER-981, I saw another problem. Hang in zookeeper_close() again. here is the stack:
(gdb) bt
#0 0x000000302b80adfb in __lll_mutex_lock_wait () from /lib64/tls/libpthread.so.0
#1 0x000000302b1307a8 in main_arena () from /lib64/tls/libc.so.6
#2 0x000000302b910230 in stack_used () from /lib64/tls/libpthread.so.0
#3 0x000000302b808dde in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0
#4 0x00000000006b4ce8 in adaptor_finish (zh=0x6902060) at src/mt_adaptor.c:217
#5 0x00000000006b0fd0 in zookeeper_close (zh=0x6902060) at src/zookeeper.c:2297
(gdb) p zh->ref_counter
$5 = 1
(gdb) p zh->close_requested
$6 = 1
(gdb) p *zh
$7 = {fd = 110112576, hostname = 0x6903620 "", addrs = 0x0, addrs_count = 1,
watcher = 0x62e5dc <doris::meta_register_mgr_t::register_mgr_watcher(_zhandle*, int, int, char const*, void*)>, last_recv = {tv_sec = 1321510694, tv_usec = 552835}, last_send = {tv_sec = 1321510694, tv_usec = 552886}, last_ping = {tv_sec = 1321510685, tv_usec = 774869}, next_deadline = { tv_sec = 1321510704, tv_usec = 547831}, recv_timeout = 30000, input_buffer = 0x0, to_process = {head = 0x0, last = 0x0, lock = {__m_reserved = 0,
__m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, to_send = {head = 0x0, last = 0x0, lock = {
__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 1, __m_lock = {__status = 0, __spinlock = 0}}}, sent_requests = {head = 0x0, last = 0x0,
cond = {__c_lock = {_status = 1, __spinlock = -1}, __c_waiting = 0x0, __padding = '\0' <repeats 15 times>, __align = 0}, lock = {_m_reserved = 0,
__m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, completions_to_process = {head = 0x2aefbff800,
last = 0x2af0e05f40, cond = {__c_lock = {__status = 592705486850, __spinlock = -1}, __c_waiting = 0x45,
_padding = "E\000\000\000\000\000\000\000\220\006\000\000\000", __align = 296352743424}, lock = {_m_reserved = 1, __m_count = 0,
__m_owner = 0x1000026ca, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, connect_index = 0, client_id = {client_id = 86551148676999146, passwd = "G懵擀\233\213\f闬202筴\002錪\034"}, last_zxid = 82057372, outstanding_sync = 0, primer_buffer = {buffer = 0x6902290 "", len = 40, curr_offset = 44, next = 0x0}, primer_storage = {len = 36, protocolVersion = 0, timeOut = 30000, sessionId = 86551148676999146, passwd_len = 16, passwd = "G懵擀\233\213\f闬202筴\002錪\034"},
primer_storage_buffer = "\000\000\000$\000\000\000\000\000\000u0\0013}惜薵闬000\000\000\020G懵擀\233\213\f闬202筴\002錪\034", state = 0, context = 0x0,
auth_h = {auth = 0x0, lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}},
ref_counter = 1, close_requested = 1, adaptor_priv = 0x0, socket_readable = {tv_sec = 0, tv_usec = 0}, active_node_watchers = 0x6901520,
active_exist_watchers = 0x69015d0, active_child_watchers = 0x6902ef0, chroot = 0x0}
I think the ref_counter is suposed to be 2 or 3 or 4 here. it seems not correct. I think maybe we should increase the ref_counter before we set zh->close_request=1, otherwise the do_io thread and do_completion thread may release the handler just after we set zh->close_request and before we increase zh->ref_counter. Thanks again
217774 No Perforce job exists for this issue. 1 32631
8 years, 17 weeks, 2 days ago must exclude patch in ZOOKEEPER-981 0|i05y5b:
ZooKeeper ZOOKEEPER-1305

zookeeper.c:prepend_string func can dereference null ptr

Bug Closed Major Fixed Daniel Lescohier Daniel Lescohier Daniel Lescohier 18/Nov/11 13:23   02/May/12 22:06 08/Dec/11 17:03 3.3.3 3.4.1, 3.3.6, 3.5.0 c client   0 3 1800 1800 0% All All the callers of the function prepend_string make a call to prepend_string before checking that zhandle_t *zh is not null. At the top of prepend_string, zh is dereferenced without checking for a null ptr:

static char* prepend_string(zhandle_t *zh, const char* client_path) {
char *ret_str;
if (zh->chroot == NULL)
return (char *) client_path;

I propose fixing this by adding the check here in prepend_string:

static char* prepend_string(zhandle_t *zh, const char* client_path) {
char *ret_str;
if (zh==NULL || zh->chroot == NULL)
return (char *) client_path;
0% 0% 1800 1800 patch 217712 No Perforce job exists for this issue. 2 32632
7 years, 47 weeks ago return ZBADARGUMENTS when passed NULL zhandle instead of dereferencing null pointer. 0|i05y5j:
ZooKeeper ZOOKEEPER-1304

[IGNORE THIS --- MOVING TO BOOKKEEPER JIRA] publish and subscribe methods get ServiceDownException even when the hubs, bookies, and zookeepers are running

Bug Resolved Major Duplicate Unassigned Daniel Kim Daniel Kim 17/Nov/11 18:37   09/Oct/13 20:09 09/Oct/13 20:09 3.5.0       0 0 1209600 1209600 0% CentOS 5.5 for all servers and workstations (however zookeeper, bookies, and hubs are all built in Ubuntu 11);
OpenJDK Runtime Environment (IcedTea6 1.9.10) (rhel-1.23.1.9.10.el5_7-i386);
OpenJDK Client VM (build 19.0-b09, mixed mode);

**[Sorry. I don't know how to delete an issue that is already submitted. I just learned of the Bookkeeper jira, and I will submit this issue there instead. You can all ignore this issue.]


Since I couldn't finish building all hedwig components in CentOS, I built it successfully in Ubuntu, then I deployed it to CentOS (no ubuntu image in my company's cloud). I configured zookeeper, bookies and hubs as they were described in the documentations. First, I copied TestPubSubClient.java's publish and subscribe tests into my own test code. I also had to create another object that extends ClientConfiguration. I named it "HedwigConf", and overwrote getDefaultServerHedwigSocketAddress() method because the server was not on the same machine as the workstation. I targetted the right host and publish seemed to work. However, it throws me ServiceDownException for publish sometimes. I checked the logs of the hubs. They seem to have connected ok with the bookies. There was no error or warning there. However, the problem seemed to exist in bookies and zookeeper. This was found in the zookeeper log: "Got user-level KeeperException when processing sessionid:0x----------- type:create cxid:0x5 zxid:0x29 txntype:-1 reqpath:n/a Error Path:/hedwig/standalone/topics Error:KeeperErrorCode = NoNode for /hedwig/standalone/topics". Normally this znode path is created automatically. Also, some bookies complained this: "WARN [NIOServerFactory] org.apache.bookkeeper.proto.NIOServerFactory - Exception in server socket loop: /0:0:0:0:0:0:0:0
java.lang.NullPointerException". For some reason, this problem comes and goes. Sometimes everything just works and the new topic is saved in a new znode, and the message is saved in bookie(s). I spent hours trying to recreate this yesterday, but I couldn't. Now it is back again. Subscribe seems to have the similar issue.
0% 0% 1209600 1209600 217600 No Perforce job exists for this issue. 0 32633
8 years, 19 weeks ago hedwig-client, hedwig, bookies 0|i05y5r:
ZooKeeper ZOOKEEPER-1303

Observer LearnerHandlers are not removed from Leader collection.

Bug Resolved Minor Duplicate Ashish Mishra Ashish Mishra Ashish Mishra 17/Nov/11 18:32   30/Apr/14 16:25 30/Apr/14 16:25 3.3.4 3.4.4, 3.5.0 scripts   1 4 604800 604800 0% The Leader.removeLearnerHandler() call removes handlers from the forwardingFollowers and learners sets, but not from observingLearners. This will cause a leak if observers are repeatedly connected and disconnected from the ensemble. 0% 0% 604800 604800 217599 No Perforce job exists for this issue. 1 32634
5 years, 47 weeks, 1 day ago 0|i05y5z:
ZooKeeper ZOOKEEPER-1302

patch to create rpm/deb on 3.3 branch

Improvement Resolved Major Won't Fix Giridharan Kesavan Giridharan Kesavan Giridharan Kesavan 16/Nov/11 16:41   17/Jan/12 16:28 17/Nov/11 00:08 3.3.3   build   0 0   backport zookeeper-999 patch to 3.3 branch and add zookeeper-setup-conf.sh to enable zk quorum setup 217434 No Perforce job exists for this issue. 3 33294
8 years, 10 weeks, 2 days ago 0|i0628n:
ZooKeeper ZOOKEEPER-1301

backport patches related to the zk startup script from 3.4 to 3.3 release

Improvement Closed Major Fixed Giridharan Kesavan Giridharan Kesavan Giridharan Kesavan 16/Nov/11 15:00   29/Nov/11 12:54 17/Nov/11 00:22 3.3.4 3.3.4     0 0   217413 No Perforce job exists for this issue. 3 33295
8 years, 19 weeks ago
Reviewed
0|i0628v:
ZooKeeper ZOOKEEPER-1300

Rat complains about incosistent licenses in the src files.

Bug Resolved Major Duplicate Mahadev Konar Mahadev Konar Mahadev Konar 16/Nov/11 14:43   21/Jul/14 16:50 21/Jul/14 16:50 3.4.0 3.5.0     0 0   From phunt:

{noformat}
Note: I even tried upgrading to RAT 0.8 and this is the output: (same/similar)

[rat:report] 15 Unknown Licenses
[rat:report]
[rat:report] *******************************
[rat:report]
[rat:report] Unapproved licenses:
[rat:report]
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/README_packaging.txt
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/contrib/ZooInspector/licences/epl-v10.html
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/c/include/winstdint.h
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/log4j.properties
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/date.format.js
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.bar.js
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.dot.js
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.line.js
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.pie.js
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.raphael.js
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/raphael.js
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/yui-min.js
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/monitoring/JMX-RESOURCES
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/zooinspector/lib/log4j.properties
[rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/zooinspector/licences/epl-v10.html
{noformat}
217410 No Perforce job exists for this issue. 0 32635
5 years, 35 weeks, 3 days ago 0|i05y67:
ZooKeeper ZOOKEEPER-1299

Add winconfig.h file to ignore in release audit.

Bug Closed Major Fixed Mahadev Konar Mahadev Konar Mahadev Konar 16/Nov/11 01:53   23/Nov/11 14:22 16/Nov/11 02:19 3.4.0 3.4.0     0 1   We need to add the winconfig.h to ignores in release audits. 217315 No Perforce job exists for this issue. 0 32636
8 years, 19 weeks, 1 day ago 0|i05y6f:
ZooKeeper ZOOKEEPER-1298

config,h gets emptied by make, at least on mac os x 10.6.8

Bug Open Major Unresolved Unassigned Jim Fulton Jim Fulton 12/Nov/11 12:29   12/Nov/11 12:29   3.3.3   c client   0 0   Mac OS X 10.6.8 configure creates a working config.h.

On Snow leopard, after running configure:

ls -l config.h
-rw-r--r-- 1 jim jim 4437 Nov 12 12:16 config.h

which looks reasomnable.

Running make replaces config.h:

make
(CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /Users/jim/s/zookeeper-3.3.3/src/c/missing --run autoheader)
/opt/local/bin/gm4: cannot open `configure.in': No such file or directory
rm -f stamp-h1
touch config.h.in
cd . && /bin/sh ./config.status config.h
config.status: creating config.h
make all-am
/bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c -o zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fno-common -DPIC -o .libs/zookeeper.o
src/zookeeper.c: In function 'log_env':
src/zookeeper.c:658: error: 'PACKAGE_STRING' undeclared (first use in this function)
src/zookeeper.c:658: error: (Each undeclared identifier is reported only once
src/zookeeper.c:658: error: for each function it appears in.)
cc1: warnings being treated as errors
src/zookeeper.c:647: warning: unused variable 'buf'
make[1]: *** [zookeeper.lo] Error 1
make: *** [all] Error 2


ls -l config.h
-rw-r--r-- 1 jim jim 137 Nov 12 12:17 config.h

config.h is empty, except for a comment.

If I make a copy of config.h after configure and restore it after
running the failed make, then I can run make again and the make
succeeds.

On a centos 5 vm, I can build just fine, but I suspect that has
something to do with it not being happy with autoconf:

(CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /home/jim/s/zookeeper-3.3.3/src/c/missing --run autoheader)
aclocal.m4:20: warning: this file was generated for autoconf 2.67.
You have another version of autoconf. It may work, but is not guaranteed to.
If you have problems, you may need to regenerate the build system entirely.
To do so, use the procedure documented by the package, typically `autoreconf'.
configure.ac:21: error: Autoconf version 2.62 or higher is required
aclocal.m4:8577: AM_INIT_AUTOMAKE is expanded from...
configure.ac:21: the top level
autom4te: /usr/bin/m4 failed with exit status: 63
autoheader: /usr/bin/autom4te failed with exit status: 63
WARNING: `autoheader' is probably too old. You should only need it if
you modified `acconfig.h' or `configure.ac'. You might want
to install the `Autoconf' and `GNU m4' packages. Grab them
from any GNU archive site.
rm -f stamp-h1
touch config.h.in
cd . && /bin/sh ./config.status config.h
config.status: creating config.h
config.status: config.h is unchanged
...

I'm pretty clueless wrt autoconf.

I can work around this by touching config.h.in before running
configure. That seems to lead to a clean make, presumably by bypassing
the autoconf step. I don't know if that matters. :)

My goal is to automate building on at least unix-like systems as part
of building a self-contained source distribution of the Python
extension that builds by just running it's setup script.
216971 No Perforce job exists for this issue. 0 32637
8 years, 19 weeks, 5 days ago 0|i05y6n:
ZooKeeper ZOOKEEPER-1297

Add stat information to create() call

New Feature Resolved Major Fixed Lenni Kuff Gunnar Wagenknecht Gunnar Wagenknecht 11/Nov/11 07:51   23/Dec/13 19:10 19/Dec/12 13:17 3.3.3 3.5.0 java client   0 3   In order to get a Stat object after creation one has to do another exists() call. This leaves client code vulnerable to a possible update window by another writer. All synchronous methods but the create() method allow to pass in a Stat object for population. It would be nice if the create() method would also allow this. newbie 216867 No Perforce job exists for this issue. 3 2407
7 years, 14 weeks ago
Reviewed
0|i00rmf:
ZooKeeper ZOOKEEPER-1296

Add zookeeper-setup-conf.sh script

Improvement Open Minor Unresolved Eric Yang Eric Yang Eric Yang 09/Nov/11 17:40   05/Feb/20 07:17   3.4.0 3.7.0, 3.5.8 scripts   0 2   Shell script It would be nice to provide a setup script for zoo.cfg and zookeeper-env.sh. The proposed script will provide the following options:

{noformat}
usage: /usr/sbin/zookeeper-setup-conf.sh <parameters>
Required parameters:
--conf-dir Set ZooKeeper configuration directory
--log-dir Set ZooKeeper log directory
Optional parameters:
--auto-purge-interval=1 Set snapshot auto purge interval
--client-port=2181 Set client port
--data-dir=/var/lib/zookeeper Set data directory
--hosts=host1,host2 Set ZooKeeper qourum hostnames
--init-limit=10 Set initial sync limit
--java-home Set JAVA_HOME location
--snapshot-count=3 Set snapshot retain count
--sync-limit=5 Set sync limit
--tick-time=2000 Set milliseconds of each tick
{noformat}
216672 No Perforce job exists for this issue. 4 41996
5 years, 51 weeks, 3 days ago 0|i07jxj:
ZooKeeper ZOOKEEPER-1295

Documentation for jute.maxbuffer is not correct in ZooKeeper Administrator's Guide

Bug Resolved Major Fixed Mohammad Arshad Daniel Lord Daniel Lord 09/Nov/11 14:23   27/Oct/19 05:28 28/May/16 13:00 3.5.2   documentation   0 6   The jute maxbuffer size is documented as being defaulted to 1 megabyte in the administrators guide. I believe that this is true server side but it is not true client side. On the client side the default is (at least in 3.3.2) this:

packetLen = Integer.getInteger("jute.maxbuffer", 4096 * 1024);

On the server side the documentation looks to be correct:
private static int determineMaxBuffer() {
String maxBufferString = System.getProperty("jute.maxbuffer");
try {
return Integer.parseInt(maxBufferString);
} catch(Exception e) {
return 0xfffff;
}

}

The documentation states this:
jute.maxbuffer:
(Java system property: jute.maxbuffer)

This option can only be set as a Java system property. There is no zookeeper prefix on it. It specifies the maximum size of the data that can be stored in a znode. The default is 0xfffff, or just under 1M. If this option is changed, the system property must be set on all servers and clients otherwise problems will arise. This is really a sanity check. ZooKeeper is designed to store data on the order of kilobytes in size.
newbie 216652 No Perforce job exists for this issue. 0 32638
3 years, 42 weeks, 5 days ago 0|i05y6v:
ZooKeeper ZOOKEEPER-1294

One of the zookeeper server is not accepting any requests

Bug Resolved Major Fixed kavita sharma amith amith 09/Nov/11 04:54   24/May/13 20:17 13/Jan/12 19:08   3.5.0 server   0 7   3 Zookeeper + 3 Observer with SuSe-11 In zoo.cfg i have configured as
server.1 = XX.XX.XX.XX:65175:65173
server.2 = XX.XX.XX.XX:65185:65183
server.3 = XX.XX.XX.XX:65195:65193
server.4 = XX.XX.XX.XX:65205:65203:observer
server.5 = XX.XX.XX.XX:65215:65213:observer
server.6 = XX.XX.XX.XX:65225:65223:observer

Like above I have configured 3 PARTICIPANTS and 3 OBSERVERS
in the cluster of 6 zookeepers

Steps to reproduce the defect
1. Start all the 3 participant zookeeper
2. Stop all the participant zookeeper
3. Start zookeeper 1(Participant)
4. Start zookeeper 2(Participant)
5. Start zookeeper 4(Observer)
6. Create a persistent node with external client and close it
7. Stop the zookeeper 1(Participant neo quorum is unstable)
8. Create a new client and try to find the node created b4 using exists api (will fail since quorum not statisfied)
9. Start the Zookeeper 1 (Participant stabilise the quorum)

Now check the observer using 4 letter word (Server.4)
linux-216:/home/amith/CI/source/install/zookeeper/zookeeper2/bin # echo stat | netcat localhost 65200
Zookeeper version: 3.3.2-1031432, built on 11/05/2010 05:32 GMT
Clients:
/127.0.0.1:46370[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Outstanding: 0
Zxid: 0x100000003
Mode: observer
Node count: 5

check the participant 2 with 4 letter word

Latency min/avg/max: 22/48/83
Received: 39
Sent: 3
Outstanding: 35
Zxid: 0x100000003
Mode: leader
Node count: 5
linux-216:/home/amith/CI/source/install/zookeeper/zookeeper2/bin #

check the participant 1 with 4 letter word

linux-216:/home/amith/CI/source/install/zookeeper/zookeeper2/bin # echo stat | netcat localhost 65170
This ZooKeeper instance is not currently serving requests

We can see the participant1 logs filled with
2011-11-08 15:49:51,360 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn@642] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running


Problem here is participent1 is not responding / accepting any requests
216580 No Perforce job exists for this issue. 4 32639
6 years, 43 weeks, 6 days ago
Incompatible change, Reviewed
0|i05y73:
ZooKeeper ZOOKEEPER-1293

Remove unused readyToStart from Leader.java

Improvement Resolved Trivial Fixed Alexander Shraer Alexander Shraer Alexander Shraer 09/Nov/11 00:31   06/Jan/12 20:24 05/Jan/12 20:33   3.5.0 server, tests   0 0   After ZOOKEEPER-1194 readyToStart is no longer used. 216559 No Perforce job exists for this issue. 3 33296
8 years, 11 weeks, 5 days ago
Reviewed
0|i06293:
ZooKeeper ZOOKEEPER-1292

FLETest is flaky

Improvement Resolved Major Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 07/Nov/11 16:46   24/Dec/11 05:57 23/Dec/11 14:47   3.5.0 leaderElection   0 0   testLE in FLETest is convoluted, difficult to read, and doesn't test FLE appropriately. The goal of this jira is to clean it up and propose a more reasonable test case. 216374 No Perforce job exists for this issue. 3 33297
8 years, 13 weeks, 5 days ago 0|i0629b:
ZooKeeper ZOOKEEPER-1291

ZOOKEEPER-1264 AcceptedEpoch not updated at leader before it proposes the epoch to followers

Sub-task Closed Major Fixed Alexander Shraer Alexander Shraer Alexander Shraer 05/Nov/11 16:01   23/Nov/11 14:22 05/Nov/11 16:59 3.4.0 3.4.0, 3.5.0 server   0 1   It is possible that a leader proposes an epoch e and a follower adopts it by setting acceptedEpoch to e but the leader itself hasn't yet done so.

While I'm not sure this contradicts Zab (there is no description of where the leader actually sets its acceptedEpoch), it is very counter intuitive.

The fix is to set acceptedEpoch in getEpochToPropose, i.e., before anyone LearnerHandler passes the getEpochToPropose barrier.

The fix is done as part of ZK-1264
216187 No Perforce job exists for this issue. 0 33298
8 years, 20 weeks, 4 days ago Revision 1198053 0|i0629j:
ZooKeeper ZOOKEEPER-1290

zookeeper_init_with_watches

New Feature Open Major Unresolved Unassigned Marc Celani Marc Celani 04/Nov/11 22:04   05/Nov/11 02:06       c client   0 0   Our use of zookeeper requires high scalability, and the underlying data set is small and changes infrequently. A persisted cache is ideal for solving scalability. We want to treat a restart as if it were a prolonged reconnect - that is, maintain the last known zxid and watch list. We would like to expose a new zookeeper_init_with_watches api that allows the zhandle to be initialized with the watch list and last known zxid. The change would reuse the current reconnect logic. 216145 No Perforce job exists for this issue. 0 41997
8 years, 20 weeks, 5 days ago 0|i07jxr:
ZooKeeper ZOOKEEPER-1289

Multi Op Watch Events

New Feature Open Major Unresolved Unassigned Marc Celani Marc Celani 04/Nov/11 21:48   30/Nov/11 18:08       c client, java client, server   0 0   Caches built on top of zookeeper clients can become inconsistent because of lack of multi op watches. Our clients receive watch notifications for paths one at a time, and in their watch handling, invalidate the path in the cache. However, the cache now has an inconsistent view of zookeeper, since it is receiving the notifications one at a time. In general, the watch handling semantics do not conform with the idea of a multi op. If changes can be made to multiple paths atomically, all clients should be notified of that change atomically. 216143 No Perforce job exists for this issue. 0 41998
8 years, 17 weeks, 1 day ago 0|i07jxz:
ZooKeeper ZOOKEEPER-1288

ZOOKEEPER-1198 Always log sessionId and zxid as hexadecimals

Sub-task Open Major Unresolved Unassigned Thomas Koch Thomas Koch 04/Nov/11 14:33   14/Jun/18 15:42           0 0   At some points, sessionIds or zxid are written in decimal numbers to the log but most of the time as hexadecimals. It's an unnecessary hassle to manually convert these numbers to find additional log lines referring the same numbers. Or worse people may not know that there may be additional information available if they also search for the decimal representation of a number. 216099 No Perforce job exists for this issue. 0 41999
8 years, 20 weeks, 5 days ago 0|i07jy7:
ZooKeeper ZOOKEEPER-1287

ZOOKEEPER-1285 DataTree deserialization methods should return DataTree instance

Sub-task Open Minor Unresolved Unassigned Thomas Koch Thomas Koch 04/Nov/11 13:28   14/Jun/18 15:42           0 0   There are a couple of deserialization methods that all receive a new DataTree instance as parameter forwarding this instance in a row until the last in the row populates this instance. While this pattern is derived from jute there's no reason not to instantiate a new DataTree object in the last deserialization method and returning it through the stack. That makes it easier to reason about the code because it then is obvious that the DataTree instance worked on is indeed a new instance. 216093 No Perforce job exists for this issue. 0 42000
8 years, 20 weeks, 6 days ago 0|i07jyf:
ZooKeeper ZOOKEEPER-1286

ZOOKEEPER-1198 QuorumPeer contains unused constructor

Sub-task Open Trivial Unresolved Thomas Koch Thomas Koch Thomas Koch 04/Nov/11 03:58   04/Nov/11 03:58           0 0   The following constructor in QuorumPeer seems to be never used, starting at line 370 in my branch:
{code:java}
/**
* For backward compatibility purposes, we instantiate QuorumMaj by default.
*/

public QuorumPeer(Map<Long, QuorumServer> quorumPeers, File dataDir,
File dataLogDir, int electionType,
long myid, int tickTime, int initLimit, int syncLimit,
ServerCnxnFactory cnxnFactory) throws IOException {
this(quorumPeers, dataDir, dataLogDir, electionType, myid, tickTime,
initLimit, syncLimit, cnxnFactory,
new QuorumMaj(countParticipants(quorumPeers)));
}
{code}
216018 No Perforce job exists for this issue. 0 42001
8 years, 20 weeks, 6 days ago 0|i07jyn:
ZooKeeper ZOOKEEPER-1285

make DataTree immutable

Improvement Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 03/Nov/11 10:38   01/May/13 22:29           0 1   ZOOKEEPER-1287 Having an immutable DataTree structure in the ZooKeeper server is an ambitious goal but is possible. Advantages would be:

- No synchronization needed when accessing the DataTree.
- The snapshotter thread gets an immutable datatree and will write a consistent DataTree to the disk.
- No headaches whether multi transactions could lead to issues with (de)serialization.
- Much better testability.
- No concurrency - No headaches.
- I hope for considerable speed improvements. Maybe also some memory savings, at least from refactorings possible after this step.
- Statistical Data about the tree can be updated on every tree modification and is always consistent. The need to save statistical data in extra nodes for the quota feature goes away.

Possible further improvements:

Read requests actually don't need to enter the processor pipeline. Instead each server connection could get a reference to a (zxid, tree) tuple. Updates
are delivered as (zxid, newTree, triggerWatchesCallback) to the server connections.
The watches could be managed at each server connection instead of centrally at the DataTree.
215914 No Perforce job exists for this issue. 0 42002
8 years, 20 weeks, 6 days ago 0|i07jyv:
ZooKeeper ZOOKEEPER-1284

ZOOKEEPER-1198 Cleanup minor PrepRequestProcessor issues

Sub-task Patch Available Minor Unresolved Thomas Koch Thomas Koch Thomas Koch 03/Nov/11 05:35   03/Nov/11 06:13           0 1   Instead of having if statements in every switch case in pRequest2Txn, it is possible to have only one if statement before the switch case in pRequest. 215882 No Perforce job exists for this issue. 1 42003
8 years, 21 weeks ago 0|i07jz3:
ZooKeeper ZOOKEEPER-1283

building 3.3 branch fails with Ant 1.8.2 (success with 1.7.1 though)

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 03/Nov/11 01:35   29/Nov/11 12:54 15/Nov/11 13:10 3.3.3 3.3.4 build   0 1   I tried to compile 3.3.3 or the current 3.3 branch head, in both cases using ant 1.8.2 fails, however 1.7.0 is successful

here's the error:
{noformat}
Testsuite: org.apache.zookeeper.VerGenTest
Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.009 sec

Testcase: warning took 0.001 sec
FAILED
Class org.apache.zookeeper.VerGenTest has no public constructor TestCase(String name) or TestCase()
junit.framework.AssertionFailedError: Class org.apache.zookeeper.VerGenTest has no public constructor TestCase(String name) or TestCase()
{noformat}
215854 No Perforce job exists for this issue. 1 32640
8 years, 19 weeks, 6 days ago committed revision 1202340 0|i05y7b:
ZooKeeper ZOOKEEPER-1282

ZOOKEEPER-1264 Learner.java not following Zab 1.0 protocol - setCurrentEpoch should be done upon receipt of NEWLEADER (before acking it) and not upon receipt of UPTODATE

Sub-task Closed Major Fixed Benjamin Reed Alexander Shraer Alexander Shraer 02/Nov/11 18:33   23/Nov/11 14:22 05/Nov/11 16:58 3.4.0 3.4.0, 3.5.0 server   0 0   according to https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
phase 2 part 2, "Once it receives NEWLEADER(e) it atomically applies
the new state and sets f.currentEpoch =e. "


In Learner.java self.setCurrentEpoch(newEpoch) is done after receiving
UPTODATE and not before acking the NEWLEADER message as should be.

case Leader.UPTODATE:
if (!snapshotTaken) {
zk.takeSnapshot();
}
self.cnxnFactory.setZooKeeperServer(zk);
break outerLoop;
case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
zk.takeSnapshot();
snapshotTaken = true;
writePacket(new QuorumPacket(Leader.ACK,
newLeaderZxid, null, null), true);
break;
}
}
}
long newEpoch = ZxidUtils.getEpochFromZxid(newLeaderZxid);
self.setCurrentEpoch(newEpoch);
215824 No Perforce job exists for this issue. 0 33299
8 years, 20 weeks, 4 days ago Revision 1198053 0|i0629r:
ZooKeeper ZOOKEEPER-1281

Stat and srvr 4 letter commands are useless on the leader when leaderServes = false

Improvement Open Major Unresolved Unassigned Daniel Lord Daniel Lord 02/Nov/11 13:25   18/Sep/17 18:46   3.3.3   server   2 4   When leaderServes = false the leader responds to the stat/srvr letter words with simply "this ZooKeeper instance is not currently serving requests". While I agree that is an accurate statement it's not terribly useful for monitoring programs. Additionally, if members of the ensemble are not currently in the quorum it becomes impossible to tell who is out of the quorum and who is the leader of the quorum.

I'm not sure if the leader should have a specially formatted response for stat/srvr or if it should simply display less information (no connections for example).
215765 No Perforce job exists for this issue. 0 42004
2 years, 26 weeks, 3 days ago 0|i07jzb:
ZooKeeper ZOOKEEPER-1280

Add current epoch number and timestamp of when it began to 4 letter words (stat, srvr, mntr maybe?)

Improvement Open Major Unresolved Unassigned Daniel Lord Daniel Lord 02/Nov/11 13:21   02/Nov/11 13:21   3.3.3   server   0 1   It would be nice if there were some stats displayed about the current epoch in the 4 letter words. At the very least it would be nice to expose the current epoch number (I know I could parse it from the Zxid but exposing it directly is more transparent) and the date of when the epoch began. 215763 No Perforce job exists for this issue. 0 42005
8 years, 21 weeks, 1 day ago 0|i07jzj:
ZooKeeper ZOOKEEPER-1279

ZOOKEEPER-1198 Only SessionTracker should hold reference to sessionsWithTimeouts

Sub-task Open Minor Unresolved Thomas Koch Thomas Koch Thomas Koch 02/Nov/11 13:13   01/May/13 22:29           0 1   Currently the ZKDataBase, ZooKeeperServer and SessionTrackers hold references to the same map, called sessionsWithTimeouts everywhere. That's very confusing. It is possible to have the reference only in the SessionTrackers and take it from there if it should ever be needed outside. 215761 No Perforce job exists for this issue. 0 42006
8 years, 21 weeks, 1 day ago 0|i07jzr:
ZooKeeper ZOOKEEPER-1278

acceptedEpoch not handling zxid rollover in lower 32bits

Bug Resolved Blocker Duplicate Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 02/Nov/11 12:54   22/Mar/12 20:36 22/Mar/12 20:36 3.4.0, 3.5.0   server   0 0   When the lower 32bits of a zxid "roll over" (zxid is a 64 bit number, however the upper 32 are considered the epoch number) the epoch number (upper 32 bits) are incremented and the lower 32 start at 0 again.

This should work fine, however, afaict, in the current 3.4/3.5 the acceptedEpoch/currentEpoch files are not being updated for this case.

See ZOOKEEPER-335 for changes from 3.3 branch.
215756 No Perforce job exists for this issue. 1 32641
8 years, 1 week ago Workaround: there is a simple workaround for this issue. Force a leader re-election before the lower 32bits reach 0xffffffff

Most users won't even see this given the number of writes on a typical installation - say you are doing 500 writes/second, you'd see this after ~3 months if the quorum is stable, any changes (such as upgrading the server software) would cause the xid to be reset, thereby staving off this issue for another period.
0|i05y7j:
ZooKeeper ZOOKEEPER-1277

servers stop serving when lower 32bits of zxid roll over

Bug Resolved Critical Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 02/Nov/11 12:46   28/Feb/19 15:20 15/Mar/12 12:55 3.3.3 3.3.5, 3.4.4, 3.5.0 server   0 11   When the lower 32bits of a zxid "roll over" (zxid is a 64 bit number, however the upper 32 are considered the epoch number) the epoch number (upper 32 bits) are incremented and the lower 32 start at 0 again.

This should work fine, however in the current 3.3 branch the followers see this as a NEWLEADER message, which it's not, and effectively stop serving clients. Attached clients seem to eventually time out given that heartbeats (or any operation) are no longer processed. The follower doesn't recover from this.

I've tested this out on 3.3 branch and confirmed this problem, however I haven't tried it on 3.4/3.5. It may not happen on the newer branches due to ZOOKEEPER-335, however there is certainly an issue with updating the "acceptedEpoch" files contained in the datadir. (I'll enter a separate jira for that)

215755 No Perforce job exists for this issue. 8 12511
2 years, 43 weeks, 1 day ago Workaround: there is a simple workaround for this issue. Force a leader re-election before the lower 32bits reach 0xffffffff

Most users won't even see this given the number of writes on a typical installation - say you are doing 500 writes/second, you'd see this after ~3 months if the quorum is stable, any changes (such as upgrading the server software) would cause the xid to be reset, thereby staving off this issue for another period.
Reviewed
0|i02hyn:
ZooKeeper ZOOKEEPER-1276

ZOOKEEPER-1198 ZKDatabase should not hold reference to FileTxnSnapLog

Sub-task Open Minor Unresolved Thomas Koch Thomas Koch Thomas Koch 02/Nov/11 12:38   01/May/13 22:29           0 0   The ZkDatabase class contains a reference to a FileTxnSnapLog although it doesn't need it. It has four methods that just forward calls to the instance and two methods that could receive an instance of FileTxnSnapLog instead of refering to a member of _this_. 215751 No Perforce job exists for this issue. 0 42007
8 years, 21 weeks, 1 day ago 0|i07jzz:
ZooKeeper ZOOKEEPER-1275

ZOOKEEPER-233 ZooKeeper client is only caller of server.DataTree.copyStat()

Sub-task Resolved Minor Won't Fix Thomas Koch Thomas Koch Thomas Koch 01/Nov/11 06:01   19/Mar/12 02:18 19/Mar/12 02:18     build, java client   0 1   This static method should be moved out of the o.a.z.server package. To my knowledge it is the only coupling of ZK client code to server code and the server doesn't even call this method. 215497 No Perforce job exists for this issue. 1 33300
8 years, 9 weeks ago 0|i0629z:
ZooKeeper ZOOKEEPER-1274

Support child watches to be displayed with 4 letter zookeeper commands (i.e. wchs, wchp and wchc)

Bug Open Major Unresolved Chris Nauroth amith amith 31/Oct/11 03:06   05/Feb/20 07:16     3.7.0, 3.5.8 server   4 7   Zookeeper Server currently only data watchers (created by exists() and getdata() )are getting displayed with wchs,wchp,wchc 4 letter command command

It would be useful to get the infomation related to childwatchers ( getChildren() ) also with 4 letter words.




215312 No Perforce job exists for this issue. 2 32642
4 years, 46 weeks, 6 days ago 0|i05y7r:
ZooKeeper ZOOKEEPER-1273

Copy'n'pasted unit test

Bug Resolved Trivial Fixed Thomas Koch Thomas Koch Thomas Koch 30/Oct/11 14:33   01/Nov/11 06:57 31/Oct/11 16:06   3.5.0 tests   0 2   Probably caused by the usage of a legacy VCS a code duplication happened when you moved from Sourceforge to Apache (ZOOKEEPER-38). The following file can be deleted:
src/java/test/org/apache/zookeeper/server/DataTreeUnitTest.java

src/java/test/org/apache/zookeeper/test/DataTreeTest.java was an exact copy of the above until ZOOKEEPER-1046 added an additional test case only to the latter.

Do I need to upload a patch file for this?
215279 No Perforce job exists for this issue. 1 32643
8 years, 21 weeks, 2 days ago
Reviewed
0|i05y7z:
ZooKeeper ZOOKEEPER-1272

ZooKeeper.multi() could violate API if server misbehaves

Bug Open Minor Unresolved Thomas Koch Thomas Koch Thomas Koch 29/Oct/11 09:36   30/Oct/11 13:11           0 0   The client API method Zookeeper.multi() promisses, that the KeeperException it throws in case of one of the multi ops failing, contains a list of individual results.

The method ZooKeeper.multiInternal() however throws a Keeperexception if the returned response header has an error code != 0. This should actually never happen if the server does not misbehave since the error code of a multi response is always zero, but I managed to trigger this code path with my refactorings.
215218 No Perforce job exists for this issue. 0 32644
8 years, 21 weeks, 4 days ago 0|i05y87:
ZooKeeper ZOOKEEPER-1271

testEarlyLeaderAbandonment failing on solaris - clients not retrying connection

Bug Closed Blocker Fixed Mahadev Konar Patrick D. Hunt Patrick D. Hunt 28/Oct/11 18:33   23/Nov/11 14:21 02/Nov/11 17:59 3.3.4, 3.4.0, 3.5.0 3.3.4, 3.4.0, 3.5.0 java client   0 2   See:
https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_solaris/1/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testEarlyLeaderAbandonment/

Notice that the clients attempt to connect before the servers have bound, then 30 seconds later, after seemingly no further client activity we see:

2011-10-28 21:40:56,828 [myid:] - INFO [main-SendThread(localhost:11227):ClientCnxn$SendThread@1057] - Client session timed out, have not heard from server in 30032ms for sessionid 0x0, closing socket connection and attempting reconnect


I believe this is different from ZOOKEEPER-1270 because in the 1270 case it seems like the clients are attempting to connect but the servers are not accepting (notice the stat commands are being dropped due to no server running)
215187 No Perforce job exists for this issue. 6 32645
8 years, 20 weeks, 1 day ago
Reviewed
0|i05y8f:
ZooKeeper ZOOKEEPER-1270

testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving.

Bug Closed Blocker Fixed Flavio Paiva Junqueira Patrick D. Hunt Patrick D. Hunt 28/Oct/11 18:25   23/Nov/11 14:22 05/Nov/11 07:46   3.4.0, 3.5.0 server   0 4   Looks pretty serious - quorum is formed but no clients can attach. Will attach logs momentarily.

This test was introduced in the following commit (all three jira commit at once):
ZOOKEEPER-335. zookeeper servers should commit the new leader txn to their logs.
ZOOKEEPER-1081. modify leader/follower code to correctly deal with new leader
ZOOKEEPER-1082. modify leader election to correctly take into account current
215186 No Perforce job exists for this issue. 15 32646
8 years, 20 weeks, 3 days ago
Reviewed
0|i05y8n:
ZooKeeper ZOOKEEPER-1269

Multi deserialization issues

Bug Closed Major Fixed Camille Fournier Camille Fournier Camille Fournier 28/Oct/11 17:34   16/Dec/11 20:33 09/Dec/11 17:25 3.4.0 3.4.1, 3.5.0 server   0 2   From the mailing list:

FileTxnSnapLog.restore contains a code block handling a NODEEXISTS failure during deserialization. The problem is explained there in a code comment. The code block however is only executed for a CREATE txn, not for a multiTxn containing a CREATE.

Even if the mentioned code block would also be executed for multi transactions, it needs adaption for multi transactions. What, if after the first failed transaction in a multi txn during deserialization, there would be subsequent transactions in the same multi that would also have failed?
We don't know, since the first failed transaction hides the information about the remaining transactions.
215177 No Perforce job exists for this issue. 1 32647
8 years, 15 weeks, 5 days ago
Reviewed
0|i05y8v:
ZooKeeper ZOOKEEPER-1268

problems with read only mode, intermittent test failures and ERRORs in the log

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 28/Oct/11 14:01   23/Nov/11 14:22 01/Nov/11 03:15 3.4.0, 3.5.0 3.4.0, 3.5.0 server   0 1   I'm having a lot problems testing the 3.4.0 release candidate (0). I'm seeing frequent failures in RO unit tests, also the solaris tests are broken on jenkins, some of which is due to RO mode:
https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_trunk_solaris/30/#showFailuresLink

I'm also seeing ERROR level messages in the logs during test runs that are a result of attempting to start RO mode.

Given this is a new feature, one that could be very disruptive, I think we need to control whether the feature is enabled or not through a config option (system prop is fine), disabled by default.

I'll look at the RO mode tests to see if I can find the cause of the failures on solaris, but I may also turn off these tests for the time being. (I need to look at this further).


I'm marking this as a blocker for 3.4.0, Mahadev LMK if you feel similarly or whether I should be shooting for 3.4.1 with this. (or perhaps I'm just way off in general).

215148 No Perforce job exists for this issue. 2 32648
8 years, 21 weeks, 2 days ago
Reviewed
0|i05y93:
ZooKeeper ZOOKEEPER-1267

ZOOKEEPER-1198 closeSession flag in finalRequestProcessor is superfluous

Sub-task Resolved Trivial Fixed Thomas Koch Thomas Koch Thomas Koch 28/Oct/11 10:16   29/Oct/11 06:56 28/Oct/11 13:00         0 0   The variable can be removed and instead where it is evaluated one can just check whether the request.type was OpCode.closesession. Removes one indirection from your head in a method that's long enough already. 215118 No Perforce job exists for this issue. 1 33301
8 years, 21 weeks, 5 days ago
Reviewed
0|i062a7:
ZooKeeper ZOOKEEPER-1266

ZOOKEEPER-1198 "request.getHdr() != null" and "isQuorum" are identical

Sub-task Open Minor Unresolved Thomas Koch Thomas Koch Thomas Koch 28/Oct/11 09:58   28/Oct/11 12:52           0 1   FinalRequestProcessor has this code block:
{code:java}
if (request.getHdr() != null) {
... SNIP ...
}
// do not add non quorum packets to the queue.
if (request.isQuorum()) {
zks.getZKDatabase().addCommittedProposal(request);
}
{code}

Both conditions are equivalent so the two if blocks could actually be merged to one block.
215113 No Perforce job exists for this issue. 1 42008
8 years, 21 weeks, 6 days ago 0|i07k07:
ZooKeeper ZOOKEEPER-1265

Normalize switch cases lists on request types

Bug Resolved Major Fixed Thomas Koch Thomas Koch Thomas Koch 28/Oct/11 09:35   29/Oct/11 06:56 28/Oct/11 12:38         0 2   As discussed on the list, it's probably an error that the ReadOnlyRequestProcessor does not have multi alongside the other write operations.
Adding check to the lists may not make a difference by now since the ZK client does not expose check as a first level request but only encapsulated inside a multi request. However from a logical view, change belongs in these lists.
215109 No Perforce job exists for this issue. 1 32649
8 years, 21 weeks, 5 days ago
Reviewed
0|i05y9b:
ZooKeeper ZOOKEEPER-1264

FollowerResyncConcurrencyTest failing intermittently

Bug Closed Blocker Fixed Camille Fournier Patrick D. Hunt Patrick D. Hunt 28/Oct/11 00:23   23/Nov/11 14:22 05/Nov/11 16:58 3.3.3, 3.4.0, 3.5.0 3.3.4, 3.4.0, 3.5.0 tests   0 5   ZOOKEEPER-1282, ZOOKEEPER-1291 The FollowerResyncConcurrencyTest test is failing intermittently.

saw the following on 3.4:
{noformat}
junit.framework.AssertionFailedError: Should have same number of
ephemerals in both followers expected:<11741> but was:<14001>
at org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
at org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
{noformat}
215052 No Perforce job exists for this issue. 19 32650
8 years, 20 weeks, 4 days ago Revision 1198053 0|i05y9j:
ZooKeeper ZOOKEEPER-1263

fix handling of min/max session timeout value initialization

Task Resolved Major Fixed Rakesh Radhakrishnan Patrick D. Hunt Patrick D. Hunt 27/Oct/11 17:47   20/Jul/15 06:54 25/Mar/14 17:14   3.5.0 server   0 5   ZOOKEEPER-1213, ZOOKEEPER-1227 This task rolls up the changes in subtasks for easier commit. (I'm about to submit the rolled up patch) 215009 No Perforce job exists for this issue. 5 42009
6 years, 1 day ago trunk: http://svn.apache.org/viewvc?view=revision&revision=1581522
Incompatible change
0|i07k0f:
ZooKeeper ZOOKEEPER-1262

Documentation for Lock recipe has major flaw

Bug Resolved Major Fixed Jordan Zimmerman Jordan Zimmerman Jordan Zimmerman 27/Oct/11 17:46   28/Dec/11 16:18 28/Dec/11 16:18 3.3.3 3.5.0 documentation   0 2   The recipe for Locks documented here: http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks doesn't deal with the problem of create() succeeding but the server crashing before the result is returned. As written, if the server crashes before the result is returned the client can never know what sequential node was created for it. The way to deal with this is to embed the session ID in the node name. The Lock implementation in the ZK distro does this. But, the documentation will lead implementors to write bad code. 215008 No Perforce job exists for this issue. 3 32651
8 years, 13 weeks, 1 day ago Updated recipes to document how to use a GUID to manage a recoverable create() error.
Reviewed
0|i05y9r:
ZooKeeper ZOOKEEPER-1261

Make ZooKeeper code mode Dependency Injection compliant.

Improvement Open Major Unresolved Unassigned Mahadev Konar Mahadev Konar 27/Oct/11 14:00   05/Feb/20 07:16     3.7.0, 3.5.8     0 0   Our code base is a little tricky to unit test and also needs fixing to be able to maintainable long term. We should make our components DI compliant, so that they are easier to test and maintainable in the long term. This is just an umbrella jira, I am sure we will need a huge code churn to be able to achieve this goal. 214957 No Perforce job exists for this issue. 0 42010
8 years, 22 weeks ago 0|i07k0n:
ZooKeeper ZOOKEEPER-1260

Audit logging in ZooKeeper servers.

New Feature Resolved Major Fixed Mohammad Arshad Mahadev Konar Mahadev Konar 27/Oct/11 13:49   19/Nov/19 05:18 11/Nov/19 07:59   3.6.0 server   6 16 0 42000   Lots of users have had questions on debugging which client changed what znode and what updates went through a znode. We should add audit logging as in Hadoop (look at Namenode Audit logging) to log which client changed what in the zookeeper servers. This could just be a log4j audit logger. 100% 100% 42000 0 pull-request-available 214956 No Perforce job exists for this issue. 2 42011
18 weeks, 3 days ago 0|i07k0v:
ZooKeeper ZOOKEEPER-1259

ZOOKEEPER-1198 central mapping from type to txn record class

Sub-task Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 27/Oct/11 07:56   05/Feb/20 07:17     3.7.0, 3.5.8     0 2   There are two places where large switch statements do nothing else to get the correct Record class accorging to a txn type. Provided a static map in SerializeUtils from type to Class<? extends Record> and a method to retrieve a new txn Record instance for a type.

Code size reduced by 28 lines.
214897 No Perforce job exists for this issue. 4 42012
5 years, 51 weeks, 3 days ago 0|i07k13:
ZooKeeper ZOOKEEPER-1258

ZOOKEEPER-1198 Move MultiResponse creation out of FinalRequestProcessor

Sub-task Patch Available Major Unresolved Thomas Koch Thomas Koch Thomas Koch 27/Oct/11 07:08   07/Oct/13 21:22           0 1   There is a longish code block in the switch case of the FinalRequestProcessor iterating over rc.multiResult and building a MultiResponse. Moved the code where it belongs, to MultiResponse and OpResult. 214894 No Perforce job exists for this issue. 1 42013
6 years, 24 weeks, 2 days ago 0|i07k1b:
ZooKeeper ZOOKEEPER-1257

ZOOKEEPER-1198 Rename MultiTransactionRecord to MultiRequest

Sub-task Open Critical Unresolved Thomas Koch Thomas Koch Thomas Koch 27/Oct/11 05:51   18/Mar/16 16:03           0 3   Understanding the code behind multi operations doesn't get any easier when the code violates naming consistency.
All other Request classes are called xxxRequest, only for multi its xxxTransactionRecord! Also "Transaction" is wrong, because there is the concepts of transactions that are transmitted between quorum peers or committed to disc. MultiTransactionRecord however is a _Request_ from a client.
214886 No Perforce job exists for this issue. 0 42014
4 years, 6 days ago 0|i07k1j:
ZooKeeper ZOOKEEPER-1256

ClientPortBindTest is failing on Mac OS X

Bug Closed Major Fixed Flavio Paiva Junqueira Daniel Gómez Ferro Daniel Gómez Ferro 27/Oct/11 03:45   17/May/17 23:43 29/Jul/16 19:05   3.5.3, 3.6.0 tests   0 7   Mac OS X ClientPortBindTest is failing consistently on Mac OS X. 214880 No Perforce job exists for this issue. 6 12508
3 years, 33 weeks, 6 days ago
Reviewed
0|i02hxz:
ZooKeeper ZOOKEEPER-1255

ZOOKEEPER-1198 unused fields in DataTree.ProcessTxnResult

Sub-task Open Minor Unresolved Thomas Koch Thomas Koch Thomas Koch 27/Oct/11 03:38   01/May/13 22:29           0 2   The fields zxid, cxid and clientId in ProcessTxnResult are never used. cxid and clientId are used in equals() and hashCode() but the class is never ever used as a key or compared.

Keeping equals() and hashCode() "just in case" is a bad idea:
http://www.infoq.com/news/2011/05/less-code-is-better
214879 No Perforce job exists for this issue. 1 42015
8 years, 21 weeks, 3 days ago 0|i07k1r:
ZooKeeper ZOOKEEPER-1254

test correct watch handling with multi ops

Improvement Resolved Major Fixed Thomas Koch Thomas Koch Thomas Koch 26/Oct/11 10:25   27/Oct/11 06:54 26/Oct/11 12:58         0 1   I was wondering, what happens with watches that would be triggered by a multi op if subsequent ops fail. I didn't find a test for this, wrote one and everything was fine. :-)

The patch contains two additional test cases.
214751 No Perforce job exists for this issue. 1 33302
8 years, 22 weeks ago
Reviewed
0|i062af:
ZooKeeper ZOOKEEPER-1253

ZOOKEEPER-1198 return value of DataTree.createNode is never used

Sub-task Resolved Trivial Fixed Thomas Koch Thomas Koch Thomas Koch 26/Oct/11 09:30   01/May/13 22:29 14/Dec/11 18:40   3.5.0     0 1   createNode returns the unmodified path string which it has received as parameter. Consequently no caller uses the return value. 214745 No Perforce job exists for this issue. 2 33303
8 years, 15 weeks ago
Reviewed
0|i062an:
ZooKeeper ZOOKEEPER-1252

ZOOKEEPER-1198 remove unused method o.a.z.test.AxyncTest.restart()

Sub-task Resolved Trivial Fixed Thomas Koch Thomas Koch Thomas Koch 26/Oct/11 08:55   28/Oct/11 06:55 27/Oct/11 12:24   3.5.0     0 0   see Summary. 214735 No Perforce job exists for this issue. 2 33304
8 years, 21 weeks, 6 days ago
Reviewed
0|i062av:
ZooKeeper ZOOKEEPER-1251

ZOOKEEPER-1198 call checkSession at begin of PrepRequestProcessor.pRequest

Sub-task Patch Available Major Unresolved Thomas Koch Thomas Koch Thomas Koch 26/Oct/11 07:07   12/Nov/11 07:15           0 0   There are 6 locations that call checkSession. This can be reduced to one location and makes it also much clearer in which cases checkSession is called or not called.

Note that in case that now the SessionMoved|Expired error is checked first before the check for a Marshalling error. However it shouldn't matter which error gets reported.
214713 No Perforce job exists for this issue. 3 42016
8 years, 19 weeks, 5 days ago
Reviewed
0|i07k1z:
ZooKeeper ZOOKEEPER-1250

ZOOKEEPER-1198 trigger jenkins dummy issue

Sub-task Resolved Trivial Invalid Thomas Koch Thomas Koch Thomas Koch 25/Oct/11 13:09   02/Nov/11 11:58 02/Nov/11 11:58         0 1   Sorry, I don't have my own servers for testing, so I need to upload patches here to run the ZK test suite. 214578 No Perforce job exists for this issue. 9 33305
8 years, 21 weeks, 1 day ago 0|i062b3:
ZooKeeper ZOOKEEPER-1249

jline should be an optional maven dependency

Improvement Resolved Trivial Duplicate Unassigned David Smiley David Smiley 25/Oct/11 11:40   01/Sep/14 03:10 11/Oct/13 12:41     build   0 2   When a project adds a maven dependency to zookeeper, they probably don't want the jline dependency. jline should have <optional>true</optional> in zookeeper's maven pom. 214551 No Perforce job exists for this issue. 0 42017
6 years, 23 weeks, 6 days ago 0|i07k27:
ZooKeeper ZOOKEEPER-1248

ZOOKEEPER-1198 multi transaction sets request.exception without reason

Sub-task Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 25/Oct/11 09:18   05/Feb/20 07:16     3.7.0, 3.5.8     0 1   I'm trying to understand the purpose of the exception field in request. This isn't made easier by the fact that the multi case in PrepRequestProcessor sets the exception without reason.

The only code that calls request.getException() is in FinalRequestProcessor and this code only acts when the operation _is not_ a multi operation.
214531 No Perforce job exists for this issue. 3 42018
8 years, 15 weeks, 1 day ago 0|i07k2f:
ZooKeeper ZOOKEEPER-1247

ZOOKEEPER-1198 dead code in PrepRequestProcessor.pRequest multi case

Sub-task Resolved Major Fixed Thomas Koch Thomas Koch Thomas Koch 25/Oct/11 07:24   28/Oct/11 06:55 27/Oct/11 18:58   3.5.0     0 0   There's an if statement in the for loop which sets the request.hdr.type and request.txn in case that an error happened in the preceding multiop. However hdr and txn are overwritten anyways at the end of the multi case. The values set are only used a bit later to serialize them. This could better be achieved with local variables holding the temporary hdr and txn.

Also the if condition (ke == null) in the catch block is pointless, since the surrounding if(ke != null) makes sure that the catch block could only ever be reached in a loop where ke == null.
214513 No Perforce job exists for this issue. 1 33306
8 years, 21 weeks, 6 days ago
Reviewed
0|i062bb:
ZooKeeper ZOOKEEPER-1246

ZOOKEEPER-1198 Dead code in PrepRequestProcessor catch Exception block

Sub-task Closed Blocker Fixed Camille Fournier Thomas Koch Thomas Koch 25/Oct/11 05:57   23/Nov/11 14:22 02/Nov/11 14:31   3.4.0, 3.5.0     0 2   This is a regression introduced by ZOOKEEPER-965 (multi transactions). The catch(Exception e) block in PrepRequestProcessor.pRequest contains an if block with condition request.getHdr() != null. This condition will always evaluate to false since the changes in ZOOKEEPER-965.

This is caused by a change in sequence: Before ZK-965, the txnHeader was set _before_ the deserialization of the request. Afterwards the deserialization happens before request.setHdr is set. So the following RequestProcessors won't see the request as a failed one but as a Read request, since it doesn't have a hdr set.

Notes:
- it is very bad practice to catch Exception. The block should rather catch IOException
- The check whether the TxnHeader is set in the request is used at several places to see whether the request is a read or write request. It isn't obvious for a newby, what it means whether a request has a hdr set or not.
- at the beginning of pRequest the hdr and txn of request are set to null. However there is no chance that these fields could ever not be null at this point. The code however suggests that this could be the case. There should rather be an assertion that confirms that these fields are indeed null. The practice of doing things "just in case", even if there is no chance that this case could happen, is a very stinky code smell and means that the code isn't understandable or trustworthy.
- The multi transaction switch case block in pRequest is very hard to read, because it missuses the request.{hdr|txn} fields as local variables.
214508 No Perforce job exists for this issue. 6 33307
8 years, 21 weeks, 1 day ago 0|i062bj:
ZooKeeper ZOOKEEPER-1245

ZOOKEEPER-1198 fix compiler warnings in contrib loggraph

Sub-task Open Major Unresolved Unassigned Thomas Koch Thomas Koch 25/Oct/11 03:00   25/Oct/11 03:01           0 2   Eclipse shows around 300 compiler warnings in loggraph, many of them no-brainers like missing generics. 214493 No Perforce job exists for this issue. 0 42019
8 years, 22 weeks, 2 days ago 0|i07k2n:
ZooKeeper ZOOKEEPER-1244

ZOOKEEPER-1198 resolve remaining compiler warnings

Sub-task Open Major Unresolved Unassigned Thomas Koch Thomas Koch 25/Oct/11 02:59   25/Oct/11 02:59           0 1   The ZooKeeper main codebase, including tests, currently triggers only 5 warnings in eclipse. The remaining 5 warnings should be fixed by people knowing these classes better then me.
Once the warnings are down to zero it could be made a policy to keep it that way.

The contrib loggraph however has around 300 warnings, many of them missing generics.
214492 No Perforce job exists for this issue. 0 42020
8 years, 22 weeks, 2 days ago 0|i07k2v:
ZooKeeper ZOOKEEPER-1243

New 4lw for short simple monitoring ldck

Improvement Resolved Blocker Won't Fix Camille Fournier Camille Fournier Camille Fournier 24/Oct/11 10:43   17/Nov/11 01:05 24/Oct/11 18:09 3.3.3, 3.4.0   server   0 0   The existing monitoring fails so often due to https://issues.apache.org/jira/browse/ZOOKEEPER-1197 that we need a workaround. This introduces a short 4lw called ldck that just runs ServerStats.toString to get information about the sever's leadership status. 214355 No Perforce job exists for this issue. 3 33308
8 years, 22 weeks, 3 days ago Srvr command duplicates. 0|i062br:
ZooKeeper ZOOKEEPER-1242

Repeat add watcher, memory leak

Bug Open Major Unresolved Peng Futian Peng Futian Peng Futian 23/Oct/11 21:35   14/Dec/19 06:07   3.3.3 3.7.0 c client   1 1 3600 3600 0% Redhat linux When I repeat add watcher , there are a memory leak.
0% 0% 3600 3600 patch 214293 No Perforce job exists for this issue. 1 32652
8 years, 16 weeks, 1 day ago 0|i05y9z:
ZooKeeper ZOOKEEPER-1241

Typo in ZooKeeper Recipes and Solutions documentation

Bug Resolved Minor Fixed Jingguo Yao Jingguo Yao Jingguo Yao 23/Oct/11 11:10   24/Oct/11 06:53 24/Oct/11 04:01 3.3.3 3.5.0 documentation   0 1 300 300 0% In "if p is the lowest process node in L, wait on highest process node in P", "P" should be "L". 0% 0% 300 300 214278 No Perforce job exists for this issue. 1 32653
8 years, 22 weeks, 3 days ago
Reviewed
0|i05ya7:
ZooKeeper ZOOKEEPER-1240

Compiler issue with redhat linux

Bug Open Minor Unresolved Peng Futian Peng Futian Peng Futian 21/Oct/11 22:16   14/Dec/19 06:08   3.3.3 3.7.0 c client   1 3 3600 3600 0% Linux phy 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
gcc version 4.1.2 20070626 (Red Hat 4.1.2-14)
When I compile zookeeper c client in my project, there are some error:
../../../include/zookeeper/recordio.h:70: error:expected unqualified-id before '__extension__'
../../../include/zookeeper/recordio.h:70: error:expected `)' before '__extension__'
../../.. /include/zookeeper/recordio.h:70: error:expected unqualified-id before ')' token
0% 0% 3600 3600 patch 113481 No Perforce job exists for this issue. 1 32654
6 years, 30 weeks ago Fix compile error under RedHat linux c client 0|i05yaf:
ZooKeeper ZOOKEEPER-1239

add logging/stats to identify fsync stalls

Improvement Closed Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 21/Oct/11 19:48   23/Nov/11 14:22 15/Nov/11 13:31   3.3.4, 3.4.0, 3.5.0 server   0 0   We don't have any logging to identify fsync stalls. It's a somewhat common occurrence (after gc/swap issues) when trying to diagnose pipeline stalls - where outstanding requests start piling up and operational latency increases.

We should have some sort of logging around this. e.g. if the fsync time exceeds some limit then log a warning, something like that.

It would also be useful to publish "stat" information related to this. min/avg/max latency for fsync.

This should also be exposed through JMX.


113472 No Perforce job exists for this issue. 2 33309
8 years, 19 weeks, 1 day ago committed to 3.3.4, 3.4, trunk rev 1202360 0|i062bz:
ZooKeeper ZOOKEEPER-1238

when the linger time was changed for NIO the patch missed Netty

Bug Closed Major Fixed Skye Wanderman-Milne Patrick D. Hunt Patrick D. Hunt 20/Oct/11 12:58   13/Mar/14 14:17 12/Jan/14 16:36 3.4.0, 3.5.0 3.4.6, 3.5.0 server   0 5   from NettyServerCnxn:

bq. bootstrap.setOption("child.soLinger", 2);

See ZOOKEEPER-1049
92391 No Perforce job exists for this issue. 1 12497
6 years, 2 weeks ago
Reviewed
0|i02hvj:
ZooKeeper ZOOKEEPER-1237

ERRORs being logged when queued responses are sent after socket has closed.

Bug Resolved Major Duplicate Unassigned Patrick D. Hunt Patrick D. Hunt 20/Oct/11 12:54   30/May/18 20:16 24/Jan/17 18:58 3.3.4, 3.4.0, 3.5.0 3.4.10 server   16 39   After applying ZOOKEEPER-1049 to 3.3.3 (I believe the same problem exists in 3.4/3.5 but haven't tested this) I'm seeing the following exception more frequently:

{noformat}
Oct 19, 1:31:53 PM ERROR
Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
{noformat}

This is a long standing problem where we try to send a response after the socket has been closed. Prior to ZOOKEEPER-1049 this issues happened much less frequently (2 sec linger), but I believe it was possible. The timing window is just wider now.

92387 No Perforce job exists for this issue. 1 32655
3 years, 13 weeks, 1 day ago 0|i05yan:
ZooKeeper ZOOKEEPER-1236

Security uses proprietary Sun APIs

Bug Resolved Major Fixed Adalberto Medeiros Patrick D. Hunt Patrick D. Hunt 20/Oct/11 12:05   04/Jul/12 14:25 30/Jun/12 02:31 3.4.0, 3.4.3 3.4.4, 3.5.0 server   0 5   See HADOOP-7211 - Recent kerberos integration resulted in the same issue in ZK.

{noformat}
[javac] /home/phunt/dev/zookeeper/src/java/main/org/apache/zookeeper/server/auth/KerberosName.java:88: warning: sun.security.krb5.KrbException is Sun proprietary API and may be removed in a future release
[javac] } catch (KrbException ke) {
{noformat}
92372 No Perforce job exists for this issue. 2 32656
7 years, 38 weeks, 5 days ago
Reviewed
0|i05yav:
ZooKeeper ZOOKEEPER-1235

ZOOKEEPER-1198 store KeeperException messages in the Code enum

Sub-task Patch Available Major Unresolved Thomas Koch Thomas Koch Thomas Koch 19/Oct/11 06:08   05/Feb/20 07:11     3.7.0, 3.5.8     0 2   Enums are just objects that can have properties. So instead of switching on the code integer, the message can be stored in the enum.

OK( OK) becomes OK (Ok, "ok")

getCodeMessage(Code code) just returns code.getMessage()
89201 No Perforce job exists for this issue. 2 42021
3 years, 39 weeks, 2 days ago 0|i07k33:
ZooKeeper ZOOKEEPER-1234

ZOOKEEPER-1198 basic cleanup in LearnerHandler

Sub-task Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 18/Oct/11 04:06   27/Oct/11 18:47           0 1   - order class members: properties, constructor, methods
- make properties private final
- rename version to protocolVersion
- The integer value 0x10000 should be extracted to a constant with a declarative name. But since I don't yet fully understand its purpose, I've no idea for the name of the constant.
- Initialize properties BinaryInpuutArchive ia, BinaryOutputArchive oa and BuferedOutputSream bufferedOutput in the constructor so that they can be made final.
- Remove call to sock.setSoTimeout. All two users of the class set the sockettimeout anyways themselfes. This also removes a link to the Leader class.
- remove unused method packetToString.
88767 No Perforce job exists for this issue. 2 42022
8 years, 22 weeks ago 0|i07k3b:
ZooKeeper ZOOKEEPER-1233

ZOOKEEPER-1198 throw RuntimeExceptions for Exceptions that "should never happen"

Sub-task Open Major Unresolved Unassigned Thomas Koch Thomas Koch 17/Oct/11 07:07   17/Oct/11 07:07           0 0   see effective java Ed2. item 65 ("Don't ignore exceptions"). If you're really sure, that the exception will never appear, then you shouldn't fear to rethrow it. 88133 No Perforce job exists for this issue. 0 42023
8 years, 23 weeks, 3 days ago 0|i07k3j:
ZooKeeper ZOOKEEPER-1232

remove unused o.a.z.server.util.Profiler

Improvement Resolved Minor Fixed Thomas Koch Thomas Koch Thomas Koch 17/Oct/11 05:34   15/Dec/11 06:58 14/Dec/11 18:26   3.5.0     0 1   The class is not used and it rather harms to suggest to people that this would be the right way to do micro-benchmarks on the JVM. It even harms to suggest that micro-benchmarks are the right way to approach Java performance issues.

Quote from http://code.google.com/p/caliper/wiki/JavaMicrobenchmarks
"Why would I ever write a microbenchmark then?

Most of the time, you shouldn't! Instead, slavishly follow a principle of simple, clear coding that avoids clever optimizations. This is the type of code that JITs of the present and future are most likely to know how to optimize themselves. And that's a job which truly should be theirs, not yours. "

Tools to do microbenchmarks:
http://code.google.com/p/caliper/ (from the team that also does Guava, the Google Java library, recommended by Joshua Bloch himself)
http://hype-free.blogspot.com/2010/01/choosing-java-profiler.html
http://www.infoq.com/articles/java-profiling-with-open-source
http://java.net/projects/japex


Joshua Bloch on Performance Anxiety:
http://java.dzone.com/articles/joshua-bloch-performance (follow link to parleys)


87737 No Perforce job exists for this issue. 1 33310
8 years, 15 weeks ago
Reviewed
0|i062c7:
ZooKeeper ZOOKEEPER-1231

ZOOKEEPER-1198 refactor int constants in o.a.z.s.q.Leader to enum

Sub-task Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 17/Oct/11 03:42   18/Oct/11 14:30           0 0   There are a couple of magic number in Leader, representing QuorumPackage types, like DIFF, TRUNC, SNAP, OBSERVERINFO, NEWLEADER, FOLLOWERINFO... These should rather be made an enum. 87704 No Perforce job exists for this issue. 0 42024
8 years, 23 weeks, 2 days ago 0|i07k3r:
ZooKeeper ZOOKEEPER-1230

ZOOKEEPER-1198 Cleanup FileTxnLog

Sub-task Open Major Unresolved Unassigned Thomas Koch Thomas Koch 15/Oct/11 13:15   01/May/13 22:29           0 0   - remove Interface TxnLog. The discussion on the mailing list (subject: "Get rid of unnecessary Interfaces") didn't give a definite No...?
- make things private where possible
- does preAllocSize need to be static and therefor global?
- the append method has one big if statement from begin to end. make this a fast return
- new private method to initialize a new logStream if logSTream == null
- move the check for a faulty transaction in the method o.a.z.s.persistence.Util.marshallTxnEntry
- mashallTxnEntry is only ever used from the append method of FileTxnLog. However I've seen the same code somewhere else...
- new private method that returns a checksum for a given bytebuffer and length
86712 No Perforce job exists for this issue. 1 42025
8 years, 12 weeks, 4 days ago 0|i07k3z:
ZooKeeper ZOOKEEPER-1229

C client hashtable_remove redundantly calls hash function

Improvement Resolved Trivial Fixed Harsh J Eric Abbott Eric Abbott 15/Oct/11 04:05   31/Dec/11 05:57 30/Dec/11 16:03 3.3.3 3.5.0 c client   0 0   hashtable_remove appears to call the hash function in consecutive lines. As hash functions are generally cpu intensive, using the hashvalue returned from the first call will result in a performance improvement.

{noformat}
void * /* returns value associated with key */
hashtable_remove(struct hashtable *h, void *k)
...
unsigned int hashvalue, index;

hashvalue = hash(h,k);
index = indexFor(h->tablelength,hash(h,k));
pE = &(h->table[index]);
e = *pE;
{noformat}
newbie 86663 No Perforce job exists for this issue. 1 33311
8 years, 12 weeks, 5 days ago
Reviewed
0|i062cf:
ZooKeeper ZOOKEEPER-1228

ZOOKEEPER-1198 Cleanup SessionTracker

Sub-task Open Major Unresolved Unassigned Thomas Koch Thomas Koch 14/Oct/11 11:10   01/May/13 22:29           0 1   - fix ordering of class members
- Remove Interface Session and rename inner class SessionImpl to Session
- make properties private final where possible
- rename SessionTrackerImpl to LeaderSessionTracker. There's a LearnerSessionTracker, so it makes sense.
- make the following code clearer, what does the bitshifting do?
{code}
public static long initializeNextSession(long id) {
long nextSid = 0;
nextSid = (System.currentTimeMillis() << 24) >> 8;
nextSid = nextSid | (id <<56);
return nextSid;
}
{code}
- replace the inner class SessionSet by a normal Set
- make SessionTrackerImpl an instance of Runnable
85600 No Perforce job exists for this issue. 0 42026
8 years, 23 weeks, 6 days ago 0|i07k47:
ZooKeeper ZOOKEEPER-1227

ZOOKEEPER-1263 Zookeeper logs is showing -1 as min/max session timeout if there is no sessiontimeout value configured

Sub-task Resolved Minor Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 14/Oct/11 10:04   25/Mar/14 17:15 25/Mar/14 17:15 3.3.3 3.5.0 server   0 1   When starting the ZooKeeper without configuring 'minimumSessionTimeOut' and 'maximumSessionTimeOut'.

I'm seeing the '-1' as the lower and the upper bound, instead it should give the default values : tickTime*2 and tickTime*20

{noformat}
2011-10-14 13:07:18,761 - INFO [main:QuorumPeerConfig@92] - Reading configuration from: /home/amith/CI/source/install/zookeeper/zookeeper1/bin/../conf/zoo.cfg

2011-10-14 13:07:19,118 - INFO [main:QuorumPeer@834] - tickTime set to 2000
2011-10-14 13:07:19,119 - INFO [main:QuorumPeer@845] - minSessionTimeout set to -1
2011-10-14 13:07:19,119 - INFO [main:QuorumPeer@856] - maxSessionTimeout set to -1
{noformat}


*Suggestion*
Move the defaulting logic to the QuorumPeerConfig instead of doing in the QuorumPeer
85590 No Perforce job exists for this issue. 1 42027
6 years, 2 days ago 0|i07k4f:
ZooKeeper ZOOKEEPER-1226

ZOOKEEPER-1198 extract version check in separate method in PrepRequestProcessor

Sub-task Resolved Major Fixed Thomas Koch Thomas Koch Thomas Koch 14/Oct/11 07:23   18/Oct/11 07:13 18/Oct/11 07:13         0 1   The following code is repeated 4 times and should be put in a method that either throws the Exception or returns the incremented version (see below).
{code}
version = setDataRequest.getVersion();
int currentVersion = nodeRecord.stat.getVersion();
if (version != -1 && version != currentVersion) {
throw new KeeperException.BadVersionException(path);
}
version = currentVersion + 1;
{code}

{code}
private static int checkAndIncVersion(int currentVersion, int versionToCompare, String path )
{code}
85562 No Perforce job exists for this issue. 1 33312
8 years, 23 weeks, 2 days ago 0|i062cn:
ZooKeeper ZOOKEEPER-1225

Successive invocation of LeaderElectionSupport.start() will bring the ELECTED node to READY and cause no one in ELECTED state.

Bug Patch Available Major Unresolved Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 13/Oct/11 06:36   05/Feb/20 07:12   3.3.3 3.7.0, 3.5.8 recipes   0 2   Presently there is no state validation for the start() api, so one can invoke multiple times consecutively. The second or further invocation will makes the client node to become 'READY' state transition. Because there is an offer already got created during the first invocation of the start() api, the second invocation again makeOffer() and after determination will be chosen as READY state transitions.

This makes the situation with no 'ELECTED' nodes present and the client (or the user of the election recipe) will be indefinitely waiting for the 'ELECTED' node.

Similarly, stop() api can be invoked and there is no state validation and this can dispatch unnecessary FAILED transition events.


IMO, LES recipe can have validation logic to avoid the successive start() and stop() invocations.
85331 No Perforce job exists for this issue. 1 2558
3 years, 39 weeks, 2 days ago 0|i00sjz:
ZooKeeper ZOOKEEPER-1224

problem across zookeeper clients when reading data written by other clients

Bug Resolved Minor Not A Problem Laxman amith amith 13/Oct/11 00:54   18/Oct/11 04:09 18/Oct/11 04:09 3.3.0 3.5.0 java client   0 2 2419200 2419200 0% Zookeeper console client (i.e, zkCli.sh )
and ZkClient
with 3 zookeeper quorum
create a java client
create a persistent node using that client
write data into the node
like..
ZkClient zk = new ZkClient ( getZKServers () );
zk.createPersistent ( "/amith" , true );
zk.writeData ( "/amith", "amith" );
Object readData = zk.readData ( "/amith" );
LOGGER.logInfo (readData);

zk.delete ( "/amith" );

and try to read the same using ZkCli.sh console client

[zk: XXX.XXX.XXX.XXX:XXXXX(CONNECTED) 2] get /amith
��tamith
cZxid = 0x100000004
ctime = Wed Oct 12 10:13:15 CST 2011
mZxid = 0x100000005
mtime = Wed Oct 12 10:13:15 CST 2011
pZxid = 0x100000004
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 12
numChildren = 0

data is displayed as ��tamith
this include some unwanted char





0% 0% 2419200 2419200 85289 No Perforce job exists for this issue. 0 32657
8 years, 23 weeks, 2 days ago 0|i05yb3:
ZooKeeper ZOOKEEPER-1223

C recipes includes <zookeeper.h> instead of <zookeeper/zookeeper.h>

Bug Open Trivial Unresolved Unassigned June Fang June Fang 12/Oct/11 23:51   29/Dec/11 11:17   3.3.3   recipes   0 1 7200 7200 0% CentOS 5 according to ZOOKEEPER-1033, headers will be installed into "PREFIX/zookeeper" directory.
i guess theses includes may also needed to be changed ?
0% 0% 7200 7200 85284 No Perforce job exists for this issue. 0 32658
8 years, 13 weeks ago 0|i05ybb:
ZooKeeper ZOOKEEPER-1222

getACL should only call DataTree.copyStat when passed in stat is not null

Bug Resolved Minor Fixed Michi Mutsuzaki Camille Fournier Camille Fournier 12/Oct/11 17:08   08/Jul/14 17:17 08/Jul/14 14:33 3.3.3, 3.4.0 3.4.7, 3.5.0 java client   0 5   getACL(String, Stat) should allow the stat object to be null in the case that the user doesn't care about getting the stat back, as per other methods with similar syntax 84819 No Perforce job exists for this issue. 3 32659
5 years, 37 weeks, 2 days ago 0|i05ybj:
ZooKeeper ZOOKEEPER-1221

ZOOKEEPER-1198 Provide accessors for Request.{hdr|txn}

Sub-task Resolved Minor Fixed Thomas Koch Thomas Koch Thomas Koch 12/Oct/11 09:20   21/Oct/11 06:55 20/Oct/11 14:03   3.5.0     0 0   I'm working on a larger patch that makes the Request class immutable. To see, where the hdr and txn fields are modified, it helped to introduce accessor methods. The JVM should happily inline the method calls so no performance overhead should be expected.

There's a minor, unrelated change included: ToBeAppliedRequestProcessor had a reference to the toBeApplied list of the Leader. So it was hard to find all places, where this list was actually modified. The patch gives instead the leader instance to the toBeAppliedRequestProcessor and the processor then accesses leader.toBeApplied.
74143 No Perforce job exists for this issue. 3 33313
8 years, 22 weeks, 6 days ago
Reviewed
0|i062cv:
ZooKeeper ZOOKEEPER-1220

./zkCli.sh 'create' command is throwing ArrayIndexOutOfBoundsException

Bug Resolved Major Fixed kavita sharma kavita sharma kavita sharma 12/Oct/11 06:47   15/Dec/11 06:58 14/Dec/11 17:59 3.3.3 3.5.0 scripts   0 4   Few problems while executing create command,

If we will give command like

1)[zk: localhost:2181(CONNECTED) 0] create -s -e /node1
{noformat}
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:692)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
{noformat}
but actually it should create emphemeral sequential node.

2)[zk: localhost:2181(CONNECTED) 0] create -s -e
{noformat}
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3
{noformat}
here it should print the list of commands that is the default behaviour of zkCli for invalid/incomplete commands.

3)[zk: localhost:2181(CONNECTED) 3] create -s -e "data"
{noformat}
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4
{noformat}
here command is wrong so it should print list of commnads. .

4)[zk: localhost:2181(CONNECTED) 0] create /node1
zkCli is treating it as a invalid command.because for args.length check (3)is their but behaviour is
if user haven't given any of the option it should create persistent node.
{noformat}
if (cmd.equals("create") && args.length >= 3) {
int first = 0;
CreateMode flags = CreateMode.PERSISTENT;
{noformat}
73885 No Perforce job exists for this issue. 4 32660
8 years, 15 weeks ago
Reviewed
0|i05ybr:
ZooKeeper ZOOKEEPER-1219

LeaderElectionSupport recipe is unnecessarily dispatching the READY_START event even if the ELECTED node stopped/expired simultaneously.

Improvement Resolved Major Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 11/Oct/11 08:29   30/Mar/14 03:07 29/Mar/14 19:02 3.3.3 3.5.0 recipes   0 5   Let's say node has determined as READY and has dispatched DETERMINE_COMPLETE event, at the same time the ELECTED node got stopped or expired . Still the f/w first dispatches the READY_START event to the node and then checks whether the ELECTED node exists() or not. Here it finds there is no 'Stat' corresponding to ELECTED and will again goes to leader determination phase.

*Problem:*
Unnecessarily the READY_START event is dispatching to the node and says node to be ready with the startup/init, even if there is no ELECTED node.

*Proposal*
Reverse the logic, first check whether ELECTED node exists() or not and then if success f/w can dispatch the READY_START event. Otherwise go to the leader determination phase.
59118 No Perforce job exists for this issue. 1 2556
5 years, 51 weeks, 4 days ago 0|i00sjj:
ZooKeeper ZOOKEEPER-1218

zktreeutil tool enhancement

Improvement Open Major Unresolved Anirban Roy Anirban Roy Anirban Roy 08/Oct/11 04:57   14/Dec/19 06:08   3.4.0 3.7.0 contrib 14/Oct/11 2 3 604800 0 604800 100% GNU/Linux i386/i686/x64_84 ============================================
zktreeutil - Zookeeper Tree Data Utility
Author: Anirban Roy (r_anirban at yahoo.com)
Organization: Yahoo Inc.
============================================

zktreeutil program is intended to manage and manipulate zk-tree data quickly,
efficiently and with ease. The utility operates on free-form ZK-tree and hence
can be used for any cluster managed by Zookeeper. Here are the basic
functionalities -

EXPORT: The whole/partial ZK-tree is exported into a XML file. This helps in
capturing a current snapshot of the data for backup/analysis. For a subtree
export, one need to specify the path to the ZK-subtree with proper option.
Since Zookeeper store binary data against znode, the data dumped on xml file
is based64 encoded with an attribute "encode=true". Optionally one may specify
not to encode data with --noencode option if data stored on zookeeper is
guaranteed to be text data.

IMPORT: The ZK (sub)tree can be imported from XML into ZK cluster. This helps in
priming the new ZK cluster with static configuration. The import can be
non-intrusive by making only additions and modifications in the existing data.
One may optionally delete existing (sub)tree before importing the new data
with --force option. The znodes which carries an attribute "encode=true" will be
decoded and written to zookeeper.

DIFF: Creates a diff between live ZK data vs data saved in XML file. Diff can
ignore some ZK-tree branches (possibly dynamic data) on reading the optional
ignore flag from XML file. Taking diff on a ZK-subtree achieved by providing
path to ZK-subtree with diff command.

UPDATE: Make the incremental changes into the live ZK-tree from saved XML,
essentially after running the diff.

DUMP: Dumps the ZK (sub)tree on the standard output device reading either from
live ZK server or XML file.

The exported ZK data into XML file can be shortened by only keeping the static
ZK nodes which are required to prime an application. The dynamic zk nodes
(created on-the-fly) can be ignored by setting a 'ignore' attribute at the root
node of the dynamic subtree (see tests/zk_sample.xml), possibly deleting all
inner ZK nodes under that. Once ignored, the whole subtree is ignored during
DIFF, UPDATE and WRITE.

Pre-requisites
--------------
1. Linux system with 2.6.X kernel.
2. Zookeeper C client library (locally built at ../../c/.libs) >= 3.X.X
3. Development build libraries (rpm packages):
a. boost-devel >= 1.32.0
b. libxml2-devel >= 2.6.26
c. log4cxx-devel >= 0.9.7-7
d. openssl-devel >= 0.9.7a
e. cppunit >= 1.12.0-2

Build instructions
------------------
1. cd into this directory
2. autoreconf -if
3. ./configure # Configure the build env
4. make # Build the tool
5. make check # Run unit-tests
6. ./src/zktreeutil --help # Usage help

Testing and usage of zktreeutil
--------------------------------
1. Run Zookeeper server locally on port 2181
2. export LD_LIBRARY_PATH=../../c/.libs/:/usr/local/lib/
3. ./src/zktreeutil --help # show help
4. ./src/zktreeutil --zookeeper=localhost:2181 --import --xmlfile=tests/zkdata_test.xml 2>/dev/null # import sample ZK tree
5. ./src/zktreeutil --zookeeper=localhost:2181 --dump --path=/myapp/version-1.0 2>/dev/null # dump Zk subtree
5. ./src/zktreeutil --zookeeper=localhost:2181 --dump --depth=3 2>/dev/null # dump Zk tree till certain depth
6. ./src/zktreeutil --xmlfile=zkdata_test.xml -D 2>/dev/null # dump the xml data
7. Change zkdata_test.xml with adding/deleting/chaging some nodes
8. ./src/zktreeutil -z localhost:2181 -F -x zkdata_test.xml -p /myapp/version-1.0/configuration 2>/dev/null # take a diff of changes
9. ./src/zktreeutil -z localhost:2181 -E --noencode 2>/dev/null > zk_sample2.xml # export the mofied ZK tree
10. ./src/zktreeutil -z localhost:2181 -U -x zkdata_test.xml -p /myapp/version-1.0/distributions 2>/dev/null # update with incr. changes
11. ./src/zktreeutil --zookeeper=localhost:2181 --import --force --xmlfile=zk_sample2.xml 2>/dev/null # re-prime the ZK tree

For more details of usage, please see the unit tests. Hope this helps. Please
reach out to me for any bugs, comments or suggestions.
100% 100% 604800 0 604800 patch 50570 No Perforce job exists for this issue. 1 2509
5 years, 51 weeks, 3 days ago 1. Export/import capability of binary data
2. Null data handing
3. Export/import/dump/diff capability of subtree
4. Efficient subtree handling
5. Improved logging
6. Improved testability with unittests
7. Option to dump/export on file
8. Fix to handle new state introduced in ZOOKEEPER-1108
zktreeutil 0|i00s93:
ZooKeeper ZOOKEEPER-1217

ZOOKEEPER-1198 Remove unnecessary MissingSessionException in ZooKeeperServer

Sub-task Patch Available Minor Unresolved Thomas Koch Thomas Koch Thomas Koch 07/Oct/11 10:59   29/Oct/11 06:39           0 1   MissingSessionException in only thrown and catched once inside this class and can as well be replaced by a boolean return value.

While I'm at it: The method throwing this Exception makes more sense to be inlined in the one place from where it is called.
50304 No Perforce job exists for this issue. 5 42028
8 years, 21 weeks, 5 days ago 0|i07k4n:
ZooKeeper ZOOKEEPER-1216

ZOOKEEPER-1198 Fix more eclipse compiler warnings, also in Tests

Sub-task Resolved Minor Fixed Thomas Koch Thomas Koch Thomas Koch 07/Oct/11 10:06   25/Oct/11 06:56 25/Oct/11 02:08   3.5.0     0 1   I did set up a new work environment for a presentation of Eclipse+EGit+Gerrit+Jenkins and found more warnings that were ignored on my machine.
Warnings are now down to 5! So no excuses to introduce new ones!

Fixed warnings:
- removed unused imports
- removed unused variables / methods
- added missing generics
- added ignore warnings for calls to deprecated code in tests
50193 No Perforce job exists for this issue. 4 33314
8 years, 22 weeks, 2 days ago
Reviewed
0|i062d3:
ZooKeeper ZOOKEEPER-1215

C client persisted cache

New Feature Open Major Unresolved Marc Celani Marc Celani Marc Celani 06/Oct/11 17:19   21/Dec/11 12:05       c client   1 3   Motivation:
1. Reduce the impact of client restarts on zookeeper by implementing a persisted cache, and only fetching deltas on restart
2. Reduce unnecessary calls to zookeeper.
3. Improve performance of gets by caching on the client
4. Allow for larger caches than in memory caches.

Behavior Change:
Zookeeper clients will not have the option to specify a folder path where it can cache zookeeper gets. If they do choose to cache results, the zookeeper library will check the persisted cache before actually sending a request to zookeeper. Watches will automatically be placed on all gets in order to invalidate the cache. Alternatively, we can add a cache flag to the get API - thoughts? On reconnect or restart, zookeeper clients will check the version number of each entries into its persisted cache, and will invalidate any old entries. In checking version number, zookeeper clients will also place a watch on those files. In regards to watches, client watch handlers will not fire until the invalidation step is completed, which may slow down client watch handling. Since setting up watches on all files is necessary on initialization, initialization will likely slow down as well.

API Change:
The zookeeper library will expose a new init interface that specifies a folder path to the cache. A new get API will specify whether or not to use cache, and whether or not stale data is safe to return if the connection is down.

Design:
The zookeeper handler structure will now include a cache_root_path (possibly null) string to cache all gets, as well as a bool for whether or not it is okay to serve stale data. Old API calls will default to a null path (which signifies no cache), and signify that it is not okay to serve stale data.

The cache will be located at a cache_root_path. All files will be placed at cache_root_path/file_path. The cache will be an incomplete copy of everything that is in zookeeper, but everything in the cache will have the same relative path from the cache_root_path that it has as a path in zookeeper. Each file in the cache will include the Statstructure and the file contents.

zoo_get will check the zookeeper handler to determine whether or not it has a cache. If it does, it will first go to the path to the persisted cache and append the get path. If the file exists and it is not invalidated, the zookeeper client will read it and return its value. If the file does not exist or is invalidated, the zookeeper library will perform the same get as is currently designed. After getting the results, the library will place the value in the persisted cache for subsequent reads. zoo_set will automatically invalidate the path in the cache.

If caching is requested, then on each zoo_get that goes through to zookeeper, a watch will be placed on the path. A cache watch handler will handle all watch events by invalidating the cache, and placing another watch on it. Client watch handlers will handle the watch event after the cache watch handler. The cache watch handler will not call zoo_get, because it is assumed that the client watch handlers will call zoo_get if they need the fresh data as soon as it is invalidated (which is why the cache watch handler must be executed first).

All updates to the cache will be done on a separate thread, but will be queued in order to maintain consistency in the cache. In addition, all client watch handlers will not be fired until the cache watch handler completes its invalidation write in order to ensure that client calls to zoo_get in the watch event handler are done after the invalidation step. This means that a client watch handler could be waiting on SEVERAL writes before it can be fired off, since all writes are queued.

When a new connection is made, if a zookeeper handler has a cache, then that cache will be scanned in order to find all leaf nodes. Calls will be made to zookeeper to check if all of these nodes still exist, and if they do, what their version number is. Any inconsistencies in version will result in the cache invalidating the out of date files. Any files that no longer exist will be deleted from the cache.

If a connection fails, and a zoo_get call is made on a zookeeper handler that has a cache associated with it, and that cache tolerates stale data, then the stale data will be returned from cache - otherwise, all zoo_gets will error out as they do today.
49786 No Perforce job exists for this issue. 0 42029
8 years, 20 weeks, 1 day ago 0|i07k4v:
ZooKeeper ZOOKEEPER-1214

QuorumPeer should unregister only its previsously registered MBeans instead of use MBeanRegistry.unregisterAll() method.

Bug Resolved Major Fixed César Álvarez Núñez César Álvarez Núñez César Álvarez Núñez 05/Oct/11 05:56   20/May/14 07:09 17/May/14 23:49   3.5.0 quorum   0 6   When a QuorumPeer thread dies, it is unregistering *all* ZKMBeanInfo MBeans previously registered on its java process; including those that has not been registered by itself.

It does not cause any side effect in production environment where each server is running on a separate java process; but fails when using "org.apache.zookeeper.test.QuorumUtil" to programmatically start up a zookeeper server ensemble and use its provided methods to force Disconnected, SyncConnected or SessionExpired events; in order to perform some basic/functional testing.

Scenario:
* QuorumUtil qU = new QuorumUtil(1); // It creates a 3 servers ensemble.
* qU.startAll(); // Startup all servers: 1 Leader + 2 Followers
* qU.shutdown\(i\); // i is a number from 1 to 3. It shutdown one server.

The last method causes that a QuorumPeer will die, invoking the MBeanRegistry.unregisterAll() method.
As a result, *all* ZKMBeanInfo MBeans are unregistered; including those belonging to the other QuorumPeer instances.

When trying to restart previous server (qU.restart\(i\)) an AssertionError is thrown at MBeanRegistry.register(ZKMBeanInfo bean, ZKMBeanInfo parent) method, causing the QuorumPeer thread dead.

To solve it:
* MBeanRegistry.unregisterAll() method has been removed.
* QuorumPeer only unregister its ZKMBeanInfo MBeans.
46382 No Perforce job exists for this issue. 6 32661
5 years, 44 weeks, 2 days ago 0|i05ybz:
ZooKeeper ZOOKEEPER-1213

ZOOKEEPER-1263 ZooKeeper server startup fails if configured only with the 'minSessionTimeout' and not 'maxSessionTimeout'

Sub-task Resolved Major Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 04/Oct/11 10:02   25/Mar/14 17:15 25/Mar/14 17:15 3.3.3 3.5.0 server   0 1   I have configured only the 'minSessionTimeout' and not configured 'maxSessionTimeout' in the zoo.cfg file as follows

+zoo.cfg+

tickTime=2000
minSessionTimeout=10000

I'm seeing the following exception and not starting the ZooKeeper server

{noformat}
2011-10-07 23:39:10,546 - INFO [main:QuorumPeerConfig@100] - Reading configuration from: /home/rakeshr/zookeeper/bin/../conf/zoo.cfg
2011-10-07 23:39:12,334 - ERROR [main:QuorumPeerMain@85] - Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /home/rakeshr/zookeeper/bin/../conf/zoo.cfg
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:120)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.lang.IllegalArgumentException: minSessionTimeout must not be larger than maxSessionTimeout
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:265)
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:116)
... 2 more
{noformat}


Startup fails due to the following validation. Here maxSessionTimeout value is -1 rather than the upper limit (tickTime * 2)

{noformat}
/** defaults to -1 if not set explicitly */
protected int maxSessionTimeout = -1;

if (minSessionTimeout > maxSessionTimeout) {
throw new IllegalArgumentException(
"minSessionTimeout must not be larger than maxSessionTimeout");
}
{noformat}
44362 No Perforce job exists for this issue. 1 42030
6 years, 2 days ago 0|i07k53:
ZooKeeper ZOOKEEPER-1212

zkServer.sh stop action is not conformat with LSB para 20.2 Init Script Actions

Bug Closed Major Fixed Roman Shaposhnik Roman Shaposhnik Roman Shaposhnik 03/Oct/11 16:24   23/Nov/11 14:22 19/Oct/11 02:42 3.3.3, 3.4.0, 3.5.0 3.3.4, 3.4.0, 3.5.0 scripts   0 1   According to LSB Core para 20.2:
==================================================================================
Otherwise,  the exit  status shall  be non­zero,  as de­fined below. 
In addition to straightforward success, the following situations are
also to be considered successful: 
• restarting a service (instead of reloading it) with the force­reload argument
• running start on a service already running
• running stop on a service already stopped or not running
• running restart on a service already stopped or not running
• running try­restart on a service already stopped or not running
==================================================================================

Yet, zkServer.sh fails on stop if it can't find a PID file:

{noformat}
stop)
echo -n "Stopping zookeeper ... "
if [ ! -f "$ZOOPIDFILE" ]
then
echo "error: could not find file $ZOOPIDFILE"
exit 1
else
$KILL -9 $(cat "$ZOOPIDFILE")
rm "$ZOOPIDFILE"
echo STOPPED
exit 0
fi
{noformat}
43879 No Perforce job exists for this issue. 2 30001
8 years, 23 weeks, 1 day ago 0|i05hxb:
ZooKeeper ZOOKEEPER-1211

C client's package name

Bug Resolved Trivial Duplicate Unassigned June Fang June Fang 30/Sep/11 05:07   12/Oct/11 22:28 12/Oct/11 22:28 3.3.3   c client   0 0 3600 3600 0% centos 5 the package name of c client is "c-client-src",
which lead the include file to be installed to /usr/local/include/c-client-src.

it's a bit annoying since user need to manual rename it to zookeeper.

i think there are two fix,
1) change autoconf package name to "zookeeper", then the header will be installed to
zookeeper subdir, which is consistent with the README;
2) change pkginclude_HEADER to include_HEADER, which will install headers to /usr/local/include.
0% 0% 3600 3600 40983 No Perforce job exists for this issue. 0 32662
8 years, 24 weeks ago 0|i05yc7:
ZooKeeper ZOOKEEPER-1210

Can't build ZooKeeper RPM with RPM >= 4.6.0 (i.e. on RHEL 6 and Fedora >= 10)

Bug Resolved Minor Fixed Tadeusz Andrzej Kadłubowski Tadeusz Andrzej Kadłubowski Tadeusz Andrzej Kadłubowski 28/Sep/11 10:20   30/Jun/12 07:01 30/Jun/12 02:19 3.4.0 3.3.6, 3.4.4 build   0 6   Tested to fail on both Centos 6.0 and Fedora 14 I was trying to build the zookeeper RPM (basically, `ant rpm -Dskip.contrib=1`), using build scripts that were recently merged from the work on the ZOOKEEPER-999 issue.

The final stage, i.e. running rpmbuild failed. From what I understand it mixed BUILD and BUILDROOT subdirectories in /tmp/zookeeper_package_build_tkadlubo/, leaving BUILDROOT empty, and placing everything in BUILD.

The full build log is at http://pastebin.com/0ZvUAKJt (Caution: I cut out long file listings from running tar -xvvf).
patch 36676 No Perforce job exists for this issue. 3 32663
7 years, 38 weeks, 5 days ago Fix buildroot misplacement on systems with RPM>=4.6. Earlier RPM versions support --buildroot commandline flag, so this doesn't break anything on older systems.
Reviewed
rpm ant 0|i05ycf:
ZooKeeper ZOOKEEPER-1209

LeaderElection recipe doesn't handle the split-brain issue, n/w disconnection can bring both the client nodes to be in ELECTED

Bug Patch Available Major Unresolved Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 28/Sep/11 02:53   05/Feb/20 07:11   3.3.3 3.7.0, 3.5.8 recipes   0 5   *Case1-* N/w disconnection can bring both the client nodes to be in ELECTED state. Current LeaderElectionSupport(LES) f/w handles only 'NodeDeletion'.

Consider the scenario where ELECTED and READY nodes are running. Say ELECTED node's n/w got failed and is "Disconnected" from ZooKeeper. But it will behave as ELECTED as it is not getting any events from the LeaderElectionSupport(LES) framework.
After sessiontimeout, node in READY state will be notified by 'NodeDeleted' event and will go to ELECTED state.
*Problem:*
Both the node becomes ELECTED and finally the user sees two Master (ELECTED) node and cause inconsistencies.


*Case2-* Also in this case, Let's say if user has started only one client node and becomes ELECTED. After sometime n/w has disconnected with the ZooKeeper server and the session got expired.
*Problem:*
Still the client node will be in the ELECTED state. After sometime if user has started the second client node. Again both the nodes becomes ELECTED.
34398 No Perforce job exists for this issue. 1 2557
3 years, 39 weeks, 2 days ago 0|i00sjr:
ZooKeeper ZOOKEEPER-1208

Ephemeral node not removed after the client session is long gone

Bug Closed Blocker Fixed Patrick D. Hunt kishore gopalakrishna kishore gopalakrishna 28/Sep/11 00:35   23/Nov/11 14:22 14/Nov/11 14:34 3.3.3 3.3.4, 3.4.0, 3.5.0     3 12   Copying from email thread.


We found our ZK server in a state where an ephemeral node still exists after
a client session is long gone. I used the cons command on each ZK host to
list all connections and couldn't find the ephemeralOwner id. We are using
ZK 3.3.3. Has anyone seen this problem?

I got the following information from the logs.

The node that still exists is /kafka-tracking/consumers/UserPerformanceEvent-<host>/owners/UserPerformanceEvent/529-7

I saw that the ephemeral owner is 86167322861045079 which is session id 0x13220b93e610550.

After searching in the transaction log of one of the ZK servers found that session expired

9/22/11 12:17:57 PM PDT session 0x13220b93e610550 cxid 0x74 zxid 0x601bd36f7 closeSession null

On digging further into the logs I found that there were multiple sessions created in quick succession and every session tried to create the same node. But i verified that the sessions were closed and opened in order
9/22/11 12:17:56 PM PDT session 0x13220b93e610550 cxid 0x0 zxid 0x601bd36b5 createSession 6000
9/22/11 12:17:57 PM PDT session 0x13220b93e610550 cxid 0x74 zxid 0x601bd36f7 closeSession null
9/22/11 12:17:58 PM PDT session 0x13220b93e610551 cxid 0x0 zxid 0x601bd36f8 createSession 6000
9/22/11 12:17:59 PM PDT session 0x13220b93e610551 cxid 0x74 zxid 0x601bd373a closeSession null
9/22/11 12:18:00 PM PDT session 0x13220b93e610552 cxid 0x0 zxid 0x601bd373e createSession 6000
9/22/11 12:18:01 PM PDT session 0x13220b93e610552 cxid 0x6c zxid 0x601bd37a0 closeSession null
9/22/11 12:18:02 PM PDT session 0x13220b93e610553 cxid 0x0 zxid 0x601bd37e9 createSession 6000
9/22/11 12:18:03 PM PDT session 0x13220b93e610553 cxid 0x74 zxid 0x601bd382b closeSession null
9/22/11 12:18:04 PM PDT session 0x13220b93e610554 cxid 0x0 zxid 0x601bd383c createSession 6000
9/22/11 12:18:05 PM PDT session 0x13220b93e610554 cxid 0x6a zxid 0x601bd388f closeSession null
9/22/11 12:18:06 PM PDT session 0x13220b93e610555 cxid 0x0 zxid 0x601bd3895 createSession 6000
9/22/11 12:18:07 PM PDT session 0x13220b93e610555 cxid 0x6a zxid 0x601bd38cd closeSession null
9/22/11 12:18:10 PM PDT session 0x13220b93e610556 cxid 0x0 zxid 0x601bd38d1 createSession 6000
9/22/11 12:18:11 PM PDT session 0x13220b93e610557 cxid 0x0 zxid 0x601bd38f2 createSession 6000
9/22/11 12:18:11 PM PDT session 0x13220b93e610557 cxid 0x51 zxid 0x601bd396a closeSession null

Here is the log output for the sessions that tried creating the same node

9/22/11 12:17:54 PM PDT session 0x13220b93e61054f cxid 0x42 zxid 0x601bd366b create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
9/22/11 12:17:56 PM PDT session 0x13220b93e610550 cxid 0x42 zxid 0x601bd36ce create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
9/22/11 12:17:58 PM PDT session 0x13220b93e610551 cxid 0x42 zxid 0x601bd3711 create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
9/22/11 12:18:00 PM PDT session 0x13220b93e610552 cxid 0x42 zxid 0x601bd3777 create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
9/22/11 12:18:02 PM PDT session 0x13220b93e610553 cxid 0x42 zxid 0x601bd3802 create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
9/22/11 12:18:05 PM PDT session 0x13220b93e610554 cxid 0x44 zxid 0x601bd385d create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
9/22/11 12:18:07 PM PDT session 0x13220b93e610555 cxid 0x44 zxid 0x601bd38b0 create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7
9/22/11 12:18:11 PM PDT session 0x13220b93e610557 cxid 0x52 zxid 0x601bd396b create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7

Let me know if you need additional information.
34379 No Perforce job exists for this issue. 4 32664
8 years, 19 weeks, 2 days ago trunk version 1201832 0|i05ycn:
ZooKeeper ZOOKEEPER-1207

strange ReadOnlyZooKeeperServer ERROR when starting ensemble

Bug Resolved Critical Invalid Unassigned Patrick D. Hunt Patrick D. Hunt 27/Sep/11 16:04   25/Apr/14 15:28 25/Apr/14 15:28   3.5.0 quorum, server   0 1   I'm seeing a strange ERROR message when starting an ensemble:

{noformat}
2011-09-27 13:00:08,168 [myid:3] - ERROR [Thread-2:QuorumPeer$1@689] - FAILED to start ReadOnlyZooKeeperServer
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at org.apache.zookeeper.server.quorum.QuorumPeer$1.run(QuorumPeer.java:684)
{noformat}

I did not specify ReadOnlyZooKeeperServer, also why is this at ERROR level? I'm not sure the expected behavior here. Is r/o turned on by default? Seems we should have this as a config option, off by default.
33620 No Perforce job exists for this issue. 0 32665
5 years, 47 weeks, 6 days ago Thanks Rakesh. I'm closing this as invalid since this has been fixed by ZOOKEEPER-1268. 0|i05ycv:
ZooKeeper ZOOKEEPER-1206

Sequential node creation does not use always use digits in node name given certain Locales.

Bug Closed Minor Fixed Mark Miller Mark Miller Mark Miller 27/Sep/11 12:26   23/Nov/11 14:22 29/Sep/11 17:36 3.3.3 3.3.4, 3.4.0, 3.5.0 server   0 1   While I always expect to be able to parse a sequential node by looking for digits, under some locals you end up with non digits - for example: n_००००००००००

It looks like the problem is around line 236 in PrepRequestProcessor:

{code}
if (createMode.isSequential()) {
path = path + String.format("%010d", parentCVersion);
}
{code}

Instead we should pass Locale.ENGLISH to the format call.

{code}
if (createMode.isSequential()) {
path = path + String.format(Locale.ENGLISH, "%010d", parentCVersion);
}
{code}

Lucene/Solr tests with random Locales, and some of my tests that try and inspect the node name and order things expect to find digits - currently my leader election recipe randomly fails when the wrong locale pops up.
19505 No Perforce job exists for this issue. 3 32666
8 years, 25 weeks, 6 days ago
Reviewed
0|i05yd3:
ZooKeeper ZOOKEEPER-1205

Add a unit test for Kerberos Ticket-Granting Ticket (TGT) renewal

Improvement Open Major Unresolved Unassigned Eugene Joseph Koontz Eugene Joseph Koontz 27/Sep/11 01:04   05/Feb/20 07:16     3.7.0, 3.5.8 tests   0 1   Create a unit test to test Kerberos ticket renewal.

Note that testing Kerberos-related functionality in Java requires that a default kerberos configuration file be available. The location of this file can be set with the java.security.krb5.conf property (see http://download.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/KerberosReq.html ). For more background on Java and Kerberos, see http://download.oracle.com/javase/1,5.0/docs/guide/security/jgss/single-signon.html . For discussion about TGT renewal, see http://freeipa.org/page/Automatic_Ticket_Renewal .

Mahadev Konar writes: "Mockito would be very helpful here."
kerberos, security 14952 No Perforce job exists for this issue. 0 42031
8 years, 4 weeks, 2 days ago 0|i07k5b:
ZooKeeper ZOOKEEPER-1204

ZOOKEEPER-1198 Shorten calls to ZooTrace

Sub-task Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 26/Sep/11 11:46   27/Oct/11 18:03           0 2   The calls to ZooTrace are kind of verbose and contain duplicated logic. This patch makes the calls as short as possible so that they do not distract that much from what's actually going on.
Calls to LOG.isTraceEnabled() are removed at many places, because this check is done anyways inside ZooTracer. At some places it has been left, to avoid costly message creation.
17 No Perforce job exists for this issue. 2 42032
8 years, 23 weeks ago 0|i07k5j:
ZooKeeper ZOOKEEPER-1203

Zookeeper systest is missing Junit Classes

Bug Closed Major Fixed Prashant Gokhale Prashant Gokhale Prashant Gokhale 23/Sep/11 18:44   23/Nov/11 14:22 29/Sep/11 14:27   3.3.4, 3.4.0, 3.5.0 tests   0 1   For running these tests, I am following instructions on https://github.com/apache/zookeeper/blob/trunk/src/java/systest/README.txt

In Step 4, when I try to run java -jar build/contrib/fatjar/zookeeper-<version>-fatjar.jar systest org.apache.zookeeper.test.system.SimpleSysTest , it throws the following error,

Exception in thread "main" java.lang.NoClassDefFoundError: junit/framework/TestCase

The problem is that zookeeper-dev-fatjar.jar does not contain the TestCase class.

Patrick Hunt suggested that adding <zipgroupfileset dir="${zk.root}/build/test/lib" includes="*.jar" /> to fatjar/build.xml should solve the problem and it does.
18 No Perforce job exists for this issue. 1 32667
8 years, 26 weeks ago
Reviewed
0|i05ydb:
ZooKeeper ZOOKEEPER-1202

Prevent certain state transitions in Java client on close(); improve exception handling and enhance client testability

Improvement Open Major Unresolved Matthias Spycher Matthias Spycher Matthias Spycher 22/Sep/11 21:56   14/Dec/19 06:07   3.4.0 3.7.0 java client   2 4   ZooKeeper.close() doesn't force the client into a CLOSED state. While the closing flag ensures that the client will close, its state may end up in CLOSED, CONNECTING or CONNECTED.
I developed a patch and in the process cleaned up a few other things primarily to enable testing of state transitions.

- ClientCnxnState is new and enforces certain state transitions
- ZooKeeper.isExpired() is new
- ClientCnxn no longer refers to ZooKeeper, WatchManager is externalized, and ClientWatchManager includes 3 new methods
- The SendThread terminates the EventThread on a call to close() via the event-of-death
- Polymorphism is used to handle internal exceptions (SendIOExceptions)
- The patch incorporates ZOOKEEPER-126.patch and prevents close() from blocking

19 No Perforce job exists for this issue. 1 2508
6 years, 1 day ago Java client 0|i00s8v:
ZooKeeper ZOOKEEPER-1201

ZOOKEEPER-1198 Clean SaslServerCallbackHandler.java

Sub-task Closed Blocker Fixed Thomas Koch Thomas Koch Thomas Koch 22/Sep/11 14:02   23/Nov/11 14:22 29/Sep/11 03:41   3.4.0, 3.5.0     0 1   Severe code style issues. 20 No Perforce job exists for this issue. 2 33315
8 years, 26 weeks ago
Reviewed
0|i062db:
ZooKeeper ZOOKEEPER-1200

ZOOKEEPER-1198 Remove obsolete DataTreeBuilder

Sub-task Resolved Major Fixed Thomas Koch Thomas Koch Thomas Koch 22/Sep/11 09:34   28/Oct/11 06:55 27/Oct/11 17:24   3.5.0     0 1   There's a DataTreeBuilder thing in the whole type hierarchy of ZooKeeperServer classes, which is never used. 21 No Perforce job exists for this issue. 3 33316
8 years, 21 weeks, 6 days ago
Reviewed
0|i062dj:
ZooKeeper ZOOKEEPER-1199

ZOOKEEPER-1198 Make OpCode an enum

Sub-task Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 22/Sep/11 05:04   27/Oct/11 17:12           0 1   ZooDefs.OpCode is an interface with integer constants. Changing this to an enum provides safety. See "Item 30: Use enums instead of int constants" in Effective Java. 22 No Perforce job exists for this issue. 6 42033
8 years, 22 weeks ago 0|i07k5r:
ZooKeeper ZOOKEEPER-1198

Refactorings and Cleanups

Improvement Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 22/Sep/11 05:00   27/Oct/11 02:41           0 1   ZOOKEEPER-1199, ZOOKEEPER-1200, ZOOKEEPER-1201, ZOOKEEPER-1204, ZOOKEEPER-1216, ZOOKEEPER-1217, ZOOKEEPER-1221, ZOOKEEPER-1226, ZOOKEEPER-1228, ZOOKEEPER-1230, ZOOKEEPER-1231, ZOOKEEPER-1233, ZOOKEEPER-1234, ZOOKEEPER-1235, ZOOKEEPER-1244, ZOOKEEPER-1245, ZOOKEEPER-1246, ZOOKEEPER-1247, ZOOKEEPER-1248, ZOOKEEPER-1250, ZOOKEEPER-1251, ZOOKEEPER-1252, ZOOKEEPER-1253, ZOOKEEPER-1255, ZOOKEEPER-1257, ZOOKEEPER-1258, ZOOKEEPER-1259, ZOOKEEPER-1266, ZOOKEEPER-1267, ZOOKEEPER-1276, ZOOKEEPER-1279, ZOOKEEPER-1284, ZOOKEEPER-1286, ZOOKEEPER-1288 Umbrella issue for refactorings. I'll post individual refactoring steps as sub-issues. I'll also use this umbrella issue to submit previews of the full refactoring for testing by Jenkins or to ReviewBoard. 2381 No Perforce job exists for this issue. 0 42034
8 years, 22 weeks, 2 days ago cleanup, cleancode 0|i07k5z:
ZooKeeper ZOOKEEPER-1197

Incorrect socket handling of 4 letter words for NIO

Bug Resolved Critical Won't Fix Camille Fournier Camille Fournier Camille Fournier 21/Sep/11 10:37   15/May/14 18:00 15/May/14 18:00 3.3.3, 3.4.0 3.5.0 server   0 3   When transferring a large amount of information from a 4 letter word, especially in interactive mode (telnet or nc) over a slower network link, the connection can be closed before all of the data has reached the client. This is due to the way we handle nc non-interactive mode, by cancelling the selector key.

Instead of cancelling the selector key for 4-letter-words, we should instead flag the NIOServerCnxn to ignore detection of a close condition on that socket (CancelledKeyException, EndOfStreamException). Since the 4lw will close the connection immediately upon completion, this should be safe to do.

See ZOOKEEPER-737 for more details
23 No Perforce job exists for this issue. 3 32668
6 years, 24 weeks, 1 day ago We'll address the problem in ZOOKEEPER-1346 by moving the 4lws to a separate port. 0|i05ydj:
ZooKeeper ZOOKEEPER-1196

improve Kerberos name parsing and canonicalization testing

Improvement Open Major Unresolved Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 20/Sep/11 13:28   24/Sep/11 10:32       server, tests   0 0   Currently we are not testing Kerberos name parsing. Kerberos name parsing is error prone because Keberos principals are complex; see http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html.

Bugs such as https://issues.apache.org/jira/browse/ZOOKEEPER-1195 would have been caught, had we better tests. Although we cannot test (yet) a full end-to-end KDC realm, we can at least test Kerberos principal syntax and semantics.
2382 No Perforce job exists for this issue. 1 42035
8 years, 26 weeks, 5 days ago security 0|i07k67:
ZooKeeper ZOOKEEPER-1195

SASL authorizedID being incorrectly set: should use getHostName() rather than getServiceName()

Bug Closed Major Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 20/Sep/11 11:15   01/May/13 22:29 29/Sep/11 03:42 3.4.0 3.4.0     0 2   Tom Klonikowski writes:

Hello developers,

the SaslServerCallbackHandler in trunk changes the principal name
service/host@REALM to service/service@REALM (i guess unintentionally).

lines 131-133:
if (!removeHost() && (kerberosName.getHostName() != null)) {
userName += "/" + kerberosName.getServiceName();
}

Server Log:

SaslServerCallbackHandler@115] - Successfully authenticated client:
authenticationID=fetcher/ubook@QUINZOO;
authorizationID=fetcher/ubook@QUINZOO.

SaslServerCallbackHandler@137] - Setting authorizedID:
fetcher/fetcher@QUINZOO

24 No Perforce job exists for this issue. 2 32669
8 years, 26 weeks ago One-line fix for bug identified by Tom Klonikowski 0|i05ydr:
ZooKeeper ZOOKEEPER-1194

Two possible race conditions during leader establishment

Bug Closed Major Fixed Alexander Shraer Alexander Shraer Alexander Shraer 19/Sep/11 20:10   23/Nov/11 14:22 05/Nov/11 02:38   3.4.0, 3.5.0 server   0 1   Leader.getEpochToPropose() and Leader.waitForNewEpoch() act as barriers - they make sure that a leader/follower can return from calling the method only once connectingFollowers (or electingFollowers) contain a quorum. But these methods don't make sure that the leader itself is in connectingFollowers/electingFollowers. So the leader didn't necessarily reach the barrier when followers pass it. This can cause the following problems:

1. If the leader is not in connectingFollowers when a LearnerHandler returns from getEpochToPropose(), then the epoch sent by the leader to the follower might be smaller than the leader's own last accepted epoch.

2. If the leader is not in electingFollowers when LearnerHandler returns from waitForNewEpoch() then the leader will send a NEWLEADER message to followers, and the followers will respond, but it is possible that the NEWLEADER message is not in outstandingProposals when these NEWLEADER acks arrive, which will cause the NEWLEADER acks to be dropped.


To fix this I propose to explicitly check that the leader is in connectingFollowers/electingFollowers before anyone can pass these barriers.




25 No Perforce job exists for this issue. 3 32670
8 years, 20 weeks, 5 days ago 0|i05ydz:
ZooKeeper ZOOKEEPER-1193

Remove upgrade code

Task Resolved Minor Fixed Thomas Koch Thomas Koch Thomas Koch 19/Sep/11 08:58   03/Apr/15 02:32 20/Oct/11 12:49   3.5.0     0 1   ZOOKEEPER-5 introduced the upgrade feature in october 2008. It may be time to think whether there are still installations in the wild that needs this upgrade feature. Otherwise the respective code can be removed.

Even if there should be old installations, couldn't they just use some ZK 3.x version to upgrade and we could still remove the upgrade code from the trunk?
2383 No Perforce job exists for this issue. 1 33317
8 years, 22 weeks, 6 days ago
Reviewed
0|i062dr:
ZooKeeper ZOOKEEPER-1192

Leader.waitForEpochAck() checks waitingForNewEpoch instead of checking electionFinished

Bug Closed Critical Fixed Alexander Shraer Alexander Shraer Alexander Shraer 18/Sep/11 22:00   23/Nov/11 14:22 05/Nov/11 02:15   3.4.0, 3.5.0 server   0 2   ZOOKEEPER-1191 A follower/leader should block in Leader.waitForEpochAck() until either electingFollowers contains a quorum and electionFinished=true or until a timeout occurs. A timeout means that a quorum of followers didn't ack the epoch on time, which is an error.

But the check in Leader.waitForEpochAck() is "if (waitingForNewEpoch) throw..." and this will never be triggered, even if the wait statement just timed out, because Leader.getEpochToPropose() completes and sets waitingForNewEpoch to false before Leader.waitForEpochAck() is invoked.

Instead of "if (waitingForNewEpoch) throw" the condition in Leader.waitForEpochAck() should be "if (!electionFinished) throw".
The guarded block introduced in ZK-1191 should be checking !electionFinished.

26 No Perforce job exists for this issue. 3 32671
8 years, 20 weeks, 5 days ago 0|i05ye7:
ZooKeeper ZOOKEEPER-1191

ZOOKEEPER-1192 Synchronization issue - wait not in guarded block

Sub-task Resolved Minor Fixed Alexander Shraer Alexander Shraer Alexander Shraer 17/Sep/11 15:53   13/Apr/14 22:10 13/Apr/14 22:10 3.4.0 3.5.0 server   0 0   In Leader.java, getEpochToPropose() and waitForEpochAck() have the following code:

if (readyToStart && verifier.containsQuorum(electingFollowers)) {
electionFinished = true;
electingFollowers.notifyAll();
} else {
electingFollowers.wait(self.getInitLimit()*self.getTickTime());
if (waitingForNewEpoch) {
throw new InterruptedException("Out of time to propose an epoch");
}
}

In Java, the wait statement can wake up without being notified, interrupted, or timing out, a so-called spurious wakeup. So it should be guarded by a while loop with the condition we're waiting for.


2384 No Perforce job exists for this issue. 2 42036
8 years, 27 weeks ago 0|i07k6f:
ZooKeeper ZOOKEEPER-1190

ant package is not including many of the bin scripts in the package (zkServer.sh for example)

Bug Closed Blocker Fixed Eric Yang Patrick D. Hunt Patrick D. Hunt 16/Sep/11 20:30   23/Nov/11 14:22 07/Oct/11 16:48 3.4.0, 3.5.0 3.4.0, 3.5.0 build   0 2   run "ant package" and look in the build/zookeeper-<version>/bin directory. many of the bin scripts are missing.
161 No Perforce job exists for this issue. 2 32672
8 years, 24 weeks, 5 days ago 0|i05yef:
ZooKeeper ZOOKEEPER-1189

For an invalid snapshot file(less than 10bytes size) RandomAccessFile stream is leaking.

Bug Closed Major Fixed Rakesh Radhakrishnan Rakesh Radhakrishnan Rakesh Radhakrishnan 16/Sep/11 10:29   23/Nov/11 14:22 26/Sep/11 21:11 3.3.3 3.3.4, 3.4.0, 3.5.0 server   0 3   When loading the snapshot, ZooKeeper will consider only the 'snapshots with atleast 10 bytes size'. Otherwsie it will ignore and just return without closing the RandomAccessFile.

{noformat}
Util.isValidSnapshot() having the following logic.
// Check for a valid snapshot
RandomAccessFile raf = new RandomAccessFile(f, "r");
// including the header and the last / bytes
// the snapshot should be atleast 10 bytes
if (raf.length() < 10) {
return false;
}
{noformat}

Since the snapshot file validation logic is outside try block, it won't go to the finally block and will be leaked.

Suggestion: Move the validation logic to the try/catch block.
27 No Perforce job exists for this issue. 3 32673
8 years, 26 weeks, 2 days ago
Incompatible change, Reviewed
0|i05yen:
ZooKeeper ZOOKEEPER-1188

client should detect a broken-network itself

Wish Open Major Unresolved Unassigned helei helei 15/Sep/11 22:47   15/Sep/11 22:47   3.3.3   c client   0 0   Client receive session expire event after the connection with servers has recovered. But I think client should get it itself, after stay in lossconnecion for a session_expire_time period. Why we always wait for the message from servers? 2385 No Perforce job exists for this issue. 0 42037
8 years, 27 weeks, 6 days ago 0|i07k6n:
ZooKeeper ZOOKEEPER-1187

remove jdk dependency from the rpm spec

Improvement Resolved Major Won't Fix Giridharan Kesavan Giridharan Kesavan Giridharan Kesavan 15/Sep/11 13:53   03/Mar/16 11:21 03/Mar/16 11:21         0 0   remove jdk dependency from the rpm spec 2386 No Perforce job exists for this issue. 1 42038
4 years, 3 weeks ago 0|i07k6v:
ZooKeeper ZOOKEEPER-1186

ZooKeeper client seems to hang quietly on OutOfMemoryError

Bug Resolved Major Duplicate Unassigned Stepan Koltsov Stepan Koltsov 15/Sep/11 09:21   01/Nov/11 12:06 01/Nov/11 12:06 3.3.3   java client   0 0   ZooKeeper client seems to hang quietly on OutOfMemoryError.

Look at code of ClientCnxn.SendThread.run:

{code}
void run() {
while (zooKeeper.state.isAlive()) {
try {
...
} catch (Exception e) {
// handle exception and restart
}
}
...
}
{code}

If OutOfMemoryError happens somewhere inside of try block, thread just exits and ZooKeeper hangs.

Client should handle any Throwable same way it handles Exception.
2387 No Perforce job exists for this issue. 0 32674
8 years, 25 weeks, 1 day ago 0|i05yev:
ZooKeeper ZOOKEEPER-1185

Send AuthFailed event to client if SASL authentication fails

Bug Closed Major Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 14/Sep/11 21:11   01/May/13 22:29 26/Sep/11 22:09 3.4.0 3.4.0, 3.5.0 java client   0 2   There are 3 places where ClientCnxn should queue a AuthFailed event if client fails to authenticate. Without sending this event, clients may be stuck watching for a SaslAuthenticated event that will never come (since the client failed to authenticate).

kerberos, security 28 No Perforce job exists for this issue. 2 32675
8 years, 26 weeks, 2 days ago This patch fixes SaslAuthFailTest.testBadSaslAuthNotifiesWatch() to test for the AuthFailed event : previously, the test was incorrectly not testing for this event.

It also removes the testBadSaslAuthNotifiesWatch() method from the SaslAuthTest class : this method belongs in SaslAuthFailTest, not SaslAuthTest. The former tests unsuccessful SASL authentication; the latter, successful SASL authentication.
0|i05yf3:
ZooKeeper ZOOKEEPER-1184

jute generated files are not being cleaned up via "ant clean"

Bug Resolved Major Fixed Thomas Koch Patrick D. Hunt Patrick D. Hunt 14/Sep/11 18:48   17/Sep/11 06:56 16/Sep/11 20:25 3.5.0 3.5.0 build   0 2   The change for ZOOKEEPER-96 has removed the generated files from SVN, it seems that these files should now live under build subdir? If this change is made be sure that the C/contrib/recipes environment is not broken... 3960 No Perforce job exists for this issue. 1 32676
8 years, 27 weeks, 5 days ago
Reviewed
0|i05yfb:
ZooKeeper ZOOKEEPER-1183

Enhance LogFormatter to output additional detail from transaction log

Improvement Patch Available Minor Unresolved kishore gopalakrishna kishore gopalakrishna kishore gopalakrishna 14/Sep/11 14:57   10/Oct/13 19:32   3.4.0       0 0   Current LogFormatter prints the following information
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
8/15/11 1:55:36 PM PDT session 0x131cf1a236f0014 cxid 0x0 zxid 0xf01 createSession
8/15/11 1:55:57 PM PDT session 0x131cf1a236f0000 cxid 0x55f zxid 0xf02 setData
8/15/11 1:56:00 PM PDT session 0x131cf1a236f0015 cxid 0x0 zxid 0xf03 createSession
...
..
8/15/11 2:00:33 PM PDT session 0x131cf1a236f001c cxid 0x36 zxid 0xf6b setData
8/15/11 2:00:33 PM PDT session 0x131cf1a236f0021 cxid 0xa1 zxid 0xf6c create
8/15/11 2:00:33 PM PDT session 0x131cf1a236f001b cxid 0x3e zxid 0xf6d setData
8/15/11 2:00:33 PM PDT session 0x131cf1a236f001e cxid 0x3e zxid 0xf6e setData
8/15/11 2:00:33 PM PDT session 0x131cf1a236f001d cxid 0x41 zxid 0xf6f setData

Though this is good information, it does not provide additional information like
createSession: which ip created the session and its time out
set|get|delete: the path and data
create: path created and createmode along with data

We can add additional parameter -detail and provide detailed output of the transaction.

Outputting data is slightly tricky since we cant print data without understanding the format. We need not print this for now.


2388 No Perforce job exists for this issue. 1 42039
6 years, 24 weeks ago 0|i07k73:
ZooKeeper ZOOKEEPER-1182

Make findbugs usable in Eclipse

Task Resolved Minor Fixed Thomas Koch Thomas Koch Thomas Koch 14/Sep/11 05:42   17/Sep/11 06:56 16/Sep/11 20:13   3.5.0     0 1   I did not find any way how one could tell the eclipse findbugs extension to ignore the java files under src/java/test. I already use src/java/test/config/findbugsExcludeFile.xml but there are still many findbug warnings.

So this patch solves the most obvious findbugs warnings under src/java/test. There are 30 remaining warnings which could either be ignored in the exclude file or solved by somebody with more knowledge about the code.
3961 No Perforce job exists for this issue. 2 33318
8 years, 27 weeks, 5 days ago
Reviewed
cleanup, cleancode 0|i062dz:
ZooKeeper ZOOKEEPER-1181

Fix problems with Kerberos TGT renewal

Bug Closed Major Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 13/Sep/11 19:38   01/May/13 22:29 24/Oct/11 02:47 3.4.0 3.4.0, 3.5.0 java client, server   0 3   Currently, in Zookeeper trunk, there are two problems with Kerberos TGT renewal:

1. TGTs obtained from a keytab are not refreshed periodically. They should be, just as those from ticket cache are refreshed.

2. Ticket renewal should be retried if it fails. Ticket renewal might fail if two or more separate processes (different JVMs) running as the same user try to renew Kerberos credentials at the same time.
kerberos, security 29 No Perforce job exists for this issue. 3 32677
8 years, 22 weeks, 3 days ago -Fixes two findbugs warnings related to holding a lock while sleeping.
-Addresses Camille's point: merge two almost-identical retry methods into a single retry method.
Reviewed
0|i05yfj:
ZooKeeper ZOOKEEPER-1180

New entry for files ignored by svn.

Improvement Resolved Minor Implemented Patrick D. Hunt Warren Turkal Warren Turkal 13/Sep/11 19:32   23/Oct/13 07:09 22/Oct/13 19:26         0 1 900 900 0% The following entry needs to be added to the svn:ignore property for src/java/lib:
ant-eclipse-*.jar

This will ignore the ant-eclipse-*.jar file which is downloaded when running the ant "eclipse" target.
0% 0% 900 900 2389 No Perforce job exists for this issue. 0 42040
6 years, 22 weeks, 1 day ago 0|i07k7b:
ZooKeeper ZOOKEEPER-1179

NettyServerCnxn does not properly close socket on 4 letter word requests

Bug Closed Critical Fixed Rakesh Radhakrishnan Camille Fournier Camille Fournier 13/Sep/11 12:20   13/Mar/14 14:17 11/Feb/14 20:18 3.4.0 3.4.6, 3.5.0 server   0 6   When calling a 4-letter-word to a server configured to use NettyServerCnxnFactory, the factory will not properly cancel all the keys and close the socket after sending the response for the 4lw. The close request will throw this exception, and the thread will not shut down:
2011-09-13 12:14:17,546 - WARN [New I/O server worker #1-1:NettyServerCnxnFactory$CnxnChannelHandler@117] - Exception caught [id: 0x009300cc, /1.1.1.1:38542 => /139.172.114.138:2181] EXCEPTION: java.io.IOException: A non-blocking socket operation could not be completed immediately
java.io.IOException: A non-blocking socket operation could not be completed immediately
at sun.nio.ch.SocketDispatcher.close0(Native Method)
at sun.nio.ch.SocketDispatcher.preClose(SocketDispatcher.java:44)
at sun.nio.ch.SocketChannelImpl.implCloseSelectableChannel(SocketChannelImpl.java:684)
at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:201)
at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97)
at org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:593)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:119)
at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76)
at org.jboss.netty.channel.Channels.close(Channels.java:720)
at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:208)
at org.apache.zookeeper.server.NettyServerCnxn.close(NettyServerCnxn.java:116)
at org.apache.zookeeper.server.NettyServerCnxn.cleanupWriterSocket(NettyServerCnxn.java:241)
at org.apache.zookeeper.server.NettyServerCnxn.access$0(NettyServerCnxn.java:231)
at org.apache.zookeeper.server.NettyServerCnxn$CommandThread.run(NettyServerCnxn.java:314)
at org.apache.zookeeper.server.NettyServerCnxn$CommandThread.start(NettyServerCnxn.java:305)
at org.apache.zookeeper.server.NettyServerCnxn.checkFourLetterWord(NettyServerCnxn.java:674)
at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:791)
at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.processMessage(NettyServerCnxnFactory.java:217)
at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.messageReceived(NettyServerCnxnFactory.java:141)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350)
at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
2390 No Perforce job exists for this issue. 3 32678
6 years, 2 weeks ago Thanks Rakesh, you are right, this error is not happening anymore. Flavio, I'm closing this. 0|i05yfr:
ZooKeeper ZOOKEEPER-1178

Add eclipse target for supporting Apache IvyDE

Improvement Patch Available Minor Unresolved Warren Turkal Warren Turkal Warren Turkal 12/Sep/11 12:03   05/Feb/20 07:12     3.7.0, 3.5.8 build   1 5 3600 3600 0% Mac OS X w/ Eclipse 3.7. However, I believe this will work in any Eclipse environment. This patch adds support for Eclipse with Apache IvyDE, which is the extension that integrates Ivy support into Eclipse. This allows the creation of what appear to be fully portable .eclipse and .classpath files. I will be posting a patch shortly. 0% 0% 3600 3600 2391 No Perforce job exists for this issue. 1 2442
3 years, 39 weeks, 2 days ago Add support for Eclipse with Apache IvyDE extension 0|i00ru7:
ZooKeeper ZOOKEEPER-1177

Enabling a large number of watches for a large number of clients

Improvement Resolved Major Fixed Fangmin Lv Vishal Kathuria Vishal Kathuria 09/Sep/11 18:59   28/Sep/18 19:20 28/Sep/18 17:38 3.3.3 3.6.0 server   1 21 0 49800   In my ZooKeeper, I see watch manager consuming several GB of memory and I dug a bit deeper.

In the scenario I am testing, I have 10K clients connected to an observer. There are about 20K znodes in ZooKeeper, each is about 1K - so about 20M data in total.
Each client fetches and puts watches on all the znodes. That is 200 million watches.

It seems a single watch takes about 100 bytes. I am currently at 14528037 watches and according to the yourkit profiler, WatchManager has 1.2 G already. This is not going to work as it might end up needing 20G of RAM just for the watches.

So we need a more compact way of storing watches. Here are the possible solutions.
1. Use a bitmap instead of the current hashmap. In this approach, each znode would get a unique id when its gets created. For every session, we can keep track of a bitmap that indicates the set of znodes this session is watching. A bitmap, assuming a 100K znodes, would be 12K. For 10K sessions, we can keep track of watches using 120M instead of 20G.
2. This second idea is based on the observation that clients watch znodes in sets (for example all znodes under a folder). Multiple clients watch the same set and the total number of sets is a couple of orders of magnitude smaller than the total number of znodes. In my scenario, there are about 100 sets. So instead of keeping track of watches at the znode level, keep track of it at the set level. It may mean that get may also need to be implemented at the set level. With this, we can save the watches in 100M.


Are there any other suggestions of solutions?

Thanks


100% 100% 49800 0 pull-request-available 2392 No Perforce job exists for this issue. 5 42041
1 year, 24 weeks, 6 days ago Changes to the watch manager to support very large (200 million) watches. This change also improves the synchronization in the WatchManager to reduce the contention on various watch manager operations (mainly addWatch() which is a fairly common operation after trigger watch).
Reviewed
0|i07k7j:
ZooKeeper ZOOKEEPER-1176

Remove dead code and basic cleanup in DataTree

Task Resolved Major Fixed Thomas Koch Thomas Koch Thomas Koch 09/Sep/11 11:44   17/Sep/11 06:56 16/Sep/11 20:36   3.5.0     0 1   - DataTree members scount, initialized and method listACLEquals are never used
- transform if(!C) B else A to if(C) A else B (removes one indirection to follow for the brain)
- remove unused imports and one annotation
- add method getApproximateDataSize to DataNode (I work towards an immutable DataNode without public properties)
- move assignments (lastPrefix = getMaxPrefixWithQuota(path)) out of if statements
- combine nested if statements: if A if B then C => if A && B => C
- make ACL maps private and add getAclSize() to hide implementation details of the ACLs.
3962 No Perforce job exists for this issue. 5 33319
8 years, 27 weeks, 5 days ago
Reviewed
cleanup, cleancode 0|i062e7:
ZooKeeper ZOOKEEPER-1175

DataNode references parent node for no reason

Improvement Resolved Minor Fixed Thomas Koch Thomas Koch Thomas Koch 08/Sep/11 14:53   15/Sep/11 06:56 14/Sep/11 18:55   3.5.0     0 1   Having the parent referenced in a node makes the tree building harder then it needs to be. With the parent you need to get the parent before you can create the DataNode. Without the parent in the DataNode one can have a method tree.put(String path, new DataNode(...)). 3963 No Perforce job exists for this issue. 2 33320
8 years, 28 weeks ago
Reviewed
0|i062ef:
ZooKeeper ZOOKEEPER-1174

FD leak when network unreachable

Bug Closed Critical Fixed Ted Dunning Ted Dunning Ted Dunning 08/Sep/11 14:47   23/Nov/11 14:22 30/Sep/11 18:02 3.3.3 3.3.4, 3.4.0, 3.5.0 java client   0 2   In the socket connection logic there are several errors that result in bad behavior. The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with. First, the socket may connect immediately. Secondly, the connect may throw an exception. In either of these two cases, I don't think that the socket should be registered.

I will attach a test case that demonstrates the problem. I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so. It would still be good to do so if somebody can figure out a good way.
167 No Perforce job exists for this issue. 8 32679
8 years, 21 weeks, 2 days ago 0|i05yfz:
ZooKeeper ZOOKEEPER-1173

Server never forgets old ACL lists

Bug Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 07/Sep/11 10:47   07/Sep/11 10:49       server   0 1   The ACL stuff in DataTree.java reimplements a kind of reference system. The idea may have been to save memory for equal ACL lists. However there's no code that ever removes an ACL list that is not used anymore.

Related:
- The ACL stuff could be in a separate class so that DataTree.java is not such a big beast anymore.
- It's risky to have mutable objects (list) as keys in a HashMap.

An idea to solve this: Have ACL lists as members of the datatree nodes. Lookup already existing ACL lists in a java.util.WeakHashMap.
acl 2393 No Perforce job exists for this issue. 0 32680
8 years, 29 weeks, 1 day ago 0|i05yg7:
ZooKeeper ZOOKEEPER-1172

Support for custom org.apache.zookeeper.client.HostProvider implementation.

Improvement Patch Available Major Unresolved César Álvarez Núñez César Álvarez Núñez César Álvarez Núñez 06/Sep/11 10:41   14/Dec/19 06:09     3.7.0 java client   1 3   The interface org.apache.zookeeper.client.HostProvider exist but it is hardcoded to org.apache.zookeeper.client.StaticHostProvider at Zookeeper constructor.

Now it could be replaced by any other implementation just by calling the new Zookeeper constructor methods which accept a HostProvider as paramater.
30 No Perforce job exists for this issue. 3 2507
5 years, 50 weeks ago Support for custom org.apache.zookeeper.client.HostProvider implementation with the help of new Zookeeper constructor methods. 0|i00s8n:
ZooKeeper ZOOKEEPER-1171

fix build for java 7

Bug Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 02/Sep/11 15:35   23/Nov/11 14:22 13/Sep/11 17:50 3.4.0 3.4.0 build   0 1   I tried testing out zk on java 7 (not yet officially supported) but I ran into a road block due to the build failing. Patch coming next. 3964 No Perforce job exists for this issue. 1 32681
8 years, 28 weeks ago
Reviewed
0|i05ygf:
ZooKeeper ZOOKEEPER-1170

Fix compiler (eclipse) warnings: unused imports, unused variables, missing generics

Improvement Resolved Minor Fixed Thomas Koch Thomas Koch Thomas Koch 02/Sep/11 15:12   15/Sep/11 06:56 14/Sep/11 18:23   3.5.0     0 1   IDE warnings get useless if there are too many of them. This issue + patch fixes nearly the rest of them. 3965 No Perforce job exists for this issue. 1 33321
8 years, 28 weeks ago
Reviewed
cleanup, cleancode 0|i062en:
ZooKeeper ZOOKEEPER-1169

Fix compiler (eclipse) warnings in (generated) jute code

Improvement Closed Minor Fixed Thomas Koch Thomas Koch Thomas Koch 02/Sep/11 14:39   23/Nov/11 14:22 02/Sep/11 16:54   3.4.0     0 0   Fixes for compiled jute parser:
- missing generic types
- added @SuppressWarnings("unused") because javacc adds a dead throws clause at the
end of functions.

Fixes for code compiled by jute compiler:
- remove import java.util.* and use full ref to java.util.Arrays

One warning fixed in non-compiled code:
src/java/main/org/apache/jute/compiler/JRecord.java

Rationale: The warnings in your IDE (eclipse) get useless if there are tons of them. This patch reduces many of them. Another issue with patch will reduce them to 8.
3966 No Perforce job exists for this issue. 1 33322
8 years, 29 weeks, 5 days ago
Reviewed
cleanup, cleancode 0|i062ev:
ZooKeeper ZOOKEEPER-1168

ZooKeeper fails to run with IKVM

Bug Closed Major Fixed Andrew Finnell Andrew Finnell Andrew Finnell 31/Aug/11 15:52   23/Nov/11 14:22 01/Sep/11 13:15 3.4.0 3.4.0 jmx   0 1 86400 86400 0% All Architectures. Running with IKVM and OpenJDK instead of Sun JDK 6. OS: Windows 64-bit
JRE: IKVM 7.0.4258

IKVM 7.0.4258 does not support ManagementFactory.getPlatformMBeanServer(); It will throw a java.lang.Error.
0% 0% 86400 86400 3967 No Perforce job exists for this issue. 2 32682
8 years, 29 weeks, 6 days ago
Reviewed
0|i05ygn:
ZooKeeper ZOOKEEPER-1167

C api lacks synchronous version of sync() call.

Bug Reopened Major Unresolved Marshall McMullen Nicholas Harteau Nicholas Harteau 31/Aug/11 10:22   05/Feb/20 07:15   3.3.3, 3.4.3, 3.5.0 3.7.0, 3.5.8 c client   2 7   Reading through the source, the C API implements zoo_async() which is the zookeeper sync() method implemented in the multithreaded/asynchronous C API. It doesn't implement anything equivalent in the non-multithreaded API.

I'm not sure if this was oversight or intentional, but it means that the non-multithreaded API can't guarantee consistent client views on critical reads.

The zkperl bindings depend on the synchronous, non-multithreaded API so also can't call sync() currently.
2394 No Perforce job exists for this issue. 1 32683
2 years, 44 weeks, 1 day ago 0|i05ygv:
ZooKeeper ZOOKEEPER-1166

Please add a few svn:ignore properties

Improvement Closed Minor Fixed Patrick D. Hunt Warren Turkal Warren Turkal 29/Aug/11 19:59   23/Nov/11 14:22 01/Sep/11 16:43 3.4.0 3.4.0 build   0 0 3600 3600 0% Please add a couple svn:ignore properties to make dealing with the code slightly easier.

At the root, please add an svn:ignore property for "build" so that the default build directory for eclipse is excluded.

At src/java/lib, please add an svn:ignore property for "*.jar" so that jars acquired by ivy are ignored.
0% 0% 3600 3600 3968 No Perforce job exists for this issue. 0 33323
8 years, 28 weeks, 2 days ago
Reviewed
0|i062f3:
ZooKeeper ZOOKEEPER-1165

better eclipse support in tests

Bug Closed Minor Fixed Warren Turkal Warren Turkal Warren Turkal 29/Aug/11 18:08   23/Nov/11 14:22 02/Sep/11 13:02 3.4.0 3.4.0 tests   0 2 3600 3600 0% Eclipse The Eclipse test runner tries to run tests from all classes that inherit from TestCase. However, this class is inherited by at least one class (org.apache.zookeeper.test.system.BaseSysTest) that has no test cases as it is used as infrastructure for other real test cases. This patch annotates that class with @Ignore, which causes the class to be Ignored. Also, due to the way annotations are not inherited by default, this patch will not affect classes that inherit from this class. 0% 0% 3600 3600 patch 3969 No Perforce job exists for this issue. 1 32684
8 years, 29 weeks, 6 days ago Small Eclipse test fix.
Reviewed
0|i05yh3:
ZooKeeper ZOOKEEPER-1164

Support encryption for C binding

New Feature Open Major Unresolved Unassigned Eric Yang Eric Yang 29/Aug/11 13:59   01/May/13 22:29   3.5.0   c client   0 0   If ZooKeeper is going to switch to netty for connections to support encryption, then C binding library and other language bindings should be updated to support communication through netty to support encryption. 2395 No Perforce job exists for this issue. 0 42042
8 years, 30 weeks, 3 days ago 0|i07k7r:
ZooKeeper ZOOKEEPER-1163

Memory leak in zk_hashtable.c:do_insert_watcher_object()

Bug Resolved Major Fixed Anupam Chanda Anupam Chanda Anupam Chanda 25/Aug/11 13:47   02/Mar/16 20:36 25/Jun/12 14:09 3.3.3 3.3.6, 3.4.4, 3.5.0 c client   0 3   zk_hashtable.c:do_insert_watcher_object() line number 193 calls add_to_list with clone flag set to 1. This leaks memory, since the original watcher object was already allocated on the heap by activateWatcher() line 330.

I will upload a patch shortly. The fix is to set clone flag to 0 in the call to add_to_list().
2396 No Perforce job exists for this issue. 1 32685
7 years, 39 weeks, 2 days ago 0|i05yhb:
ZooKeeper ZOOKEEPER-1162

consistent handling of jute.maxbuffer when attempting to read large zk "directories"

Improvement Open Major Unresolved Michael Han Jonathan Hsieh Jonathan Hsieh 25/Aug/11 01:36   14/Dec/19 06:08   3.3.3 3.7.0 server   12 25   Recently we encountered a sitaution where a zk directory got sucessfully populated with 250k elements. When our system attempted to read the znode dir, it failed because the contents of the dir exceeded the default 1mb jute.maxbuffer limit. There were a few odd things

1) It seems odd that we could populate to be very large but could not read the listing
2) The workaround was bumping up jute.maxbuffer on the client side
Would it make more sense to have it reject adding new znodes if it exceeds jute.maxbuffer?
Alternately, would it make sense to have zk dir listing ignore the jute.maxbuffer setting?
2397 No Perforce job exists for this issue. 0 42043
3 years, 9 weeks, 1 day ago 0|i07k7z:
ZooKeeper ZOOKEEPER-1161

Provide an option for disabling auto-creation of the data directory

New Feature Resolved Major Fixed Patrick D. Hunt Roman Shaposhnik Roman Shaposhnik 24/Aug/11 16:17   07/Mar/12 05:58 06/Mar/12 03:23   3.5.0 scripts, server   0 2   Currently if ZK starts and doesn't see and existing dataDir it tries to create it. There should be an option to tweak this behavior. As for default, my personal opinion is to NOW allow autocreate. 2398 No Perforce job exists for this issue. 3 12512
8 years, 3 weeks, 1 day ago
Reviewed
0|i02hyv:
ZooKeeper ZOOKEEPER-1160

ZOOKEEPER-1157 test timeouts are too small

Sub-task Closed Major Fixed Benjamin Reed Benjamin Reed Benjamin Reed 23/Aug/11 01:03   23/Nov/11 14:22 05/Sep/11 14:32   3.4.0 tests   0 0   in reviewing some tests that weren't passing i notices that the tick time was 2ms rather than the normal 2000ms. i think this is causing tests to fail on some slow/overloaded machines. 3970 No Perforce job exists for this issue. 2 33324
8 years, 29 weeks, 2 days ago
Reviewed
0|i062fb:
ZooKeeper ZOOKEEPER-1159

ClientCnxn does not propagate session expiration indication

Bug Resolved Major Won't Fix Andor Molnar Andrew Kyle Purtell Andrew Kyle Purtell 20/Aug/11 13:16   08/May/18 16:43 08/May/18 16:42 3.4.0 3.4.0 java client   6 11   ClientCnxn does not always propagate session expiration indication up to clients. If a reconnection attempt fails because the session has since expired, the KeeperCode is still Disconnected, but shouldn't it be set to Expired? Perhaps like so:

{code}
--- a/src/java/main/org/apache/zookeeper/ClientCnxn.java
+++ b/src/java/main/org/apache/zookeeper/ClientCnxn.java
@@ -1160,6 +1160,7 @@ public class ClientCnxn {
clientCnxnSocket.doTransport(to, pendingQueue, outgoingQueue);

} catch (Exception e) {
+ Event.KeeperState eventState = Event.KeeperState.Disconnected;
if (closing) {
if (LOG.isDebugEnabled()) {
// closing so this is expected
@@ -1172,6 +1173,7 @@ public class ClientCnxn {
// this is ugly, you have a better way speak up
if (e instanceof SessionExpiredException) {
LOG.info(e.getMessage() + ", closing socket connection");
+ eventState = Event.KeeperState.Expired;
} else if (e instanceof SessionTimeoutException) {
LOG.info(e.getMessage() + RETRY_CONN_MSG);
} else if (e instanceof EndOfStreamException) {
@@ -1191,7 +1193,7 @@ public class ClientCnxn {
if (state.isAlive()) {
eventThread.queueEvent(new WatchedEvent(
Event.EventType.None,
- Event.KeeperState.Disconnected,
+ eventState,
null));
}
clientCnxnSocket.updateNow();
{code}

This affects HBase. HBase master and region server processes will shut down by design if their session has expired, but will attempt to reconnect if they think they have been disconnected. The above prevents proper termination.
165 No Perforce job exists for this issue. 0 32686
2 years, 4 weeks, 2 days ago 0|i05yhj:
ZooKeeper ZOOKEEPER-1158

C# client

Improvement Open Major Unresolved Eric Hauser Eric Hauser Eric Hauser 19/Aug/11 23:17   14/Dec/19 06:08     3.7.0     0 7   Native C# client for ZooKeeper. 2399 No Perforce job exists for this issue. 4 42044
7 years, 49 weeks, 2 days ago 0|i07k87:
ZooKeeper ZOOKEEPER-1157

Some of the tests timeout or cause JVM crash

Bug Open Minor Unresolved Unassigned Vishal Kathuria Vishal Kathuria 19/Aug/11 15:58   29/Jun/12 13:11   3.3.3   tests   0 1   ZOOKEEPER-1160 The following tests are consistently timing out for me, and sometimes they crash the JVM. We need to look at these tests and make sure they pass consistently, otherwise they provide no value.

org.apache.zookeeper.test.AsyncHammerTest
org.apache.zookeeper.test.FollowerResyncConcurrencyTest
org.apache.zookeeper.test.ObserverQuorumHammerTest
org.apache.zookeeper.test.QuorumHammerTest
org.apache.zookeeper.test.QuorumTest
test 2400 No Perforce job exists for this issue. 0 32687
8 years, 31 weeks, 2 days ago 0|i05yhr:
ZooKeeper ZOOKEEPER-1156

Log truncation truncating log too much - can cause data loss

Bug Closed Blocker Fixed Vishal Kathuria Vishal Kathuria Vishal Kathuria 18/Aug/11 13:48   23/Nov/11 14:22 05/Sep/11 16:04 3.3.3 3.3.4, 3.4.0 quorum, server   0 2 86400 86400 0% The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc.

Clients can see different values depending on whether they connect to the node on which trunc was executed.
0% 0% 86400 86400 3971 No Perforce job exists for this issue. 1 32688
8 years, 29 weeks, 5 days ago 0|i05yhz:
ZooKeeper ZOOKEEPER-1155

Add windows automated builds (CI) for zookeeper c client bindings

Improvement Resolved Major Fixed Camille Fournier Dheeraj Agrawal Dheeraj Agrawal 16/Aug/11 13:26   20/Oct/11 10:09 20/Oct/11 10:09 3.3.4, 3.4.0   c client   0 1   setup an CI build on windows to make sure that the new code checked in compiles fine on windows (VS compilers) for the zookeeper c bindings.

There is a ticket opened with the INFRA team to assign a build box and setup CI env for zookeeper c bindings
https://issues.apache.org/jira/browse/INFRA-3842

feel free to help us with this effort, this will ensure that the new checkins dont break windows builds.
2401 No Perforce job exists for this issue. 2 33325
8 years, 23 weeks ago 0|i062fj:
ZooKeeper ZOOKEEPER-1154

Data inconsistency when the node(s) with the highest zxid is not present at the time of leader election

Bug Closed Blocker Fixed Vishal Kathuria Vishal Kathuria Vishal Kathuria 15/Aug/11 13:36   23/Nov/11 14:22 05/Sep/11 16:04 3.3.3 3.3.4, 3.4.0 quorum   0 2 1814400 1814400 0% If a participant with the highest zxid (lets call it A) isn't present during leader election, a participant with a lower zxid (say B) might be chosen as a leader. When A comes up, it will replay the log with that higher zxid. The change that was in that higher zxid will only be visible to the clients connecting to the participant A, but not to other participants.

I was able to reproduce this problem by
1. connect debugger to B and C and suspend them, so they don't write anything
2. Issue an update to the leader A.
3. After a few seconds, crash all servers (A,B,C)
4. Start B and C, let the leader election take place
5. Start A.
6. You will find that the update done in step 2 is visible on A but not on B,C, hence the inconsistency.

Below is a more detailed analysis of what is happening in the code.


Initial Condition
1. Lets say there are three nodes in the ensemble A,B,C with A being the leader
2. The current epoch is 7.
3. For simplicity of the example, lets say zxid is a two digit number, with epoch being the first digit.
4. The zxid is 73
5. All the nodes have seen the change 73 and have persistently logged it.

Step 1
Request with zxid 74 is issued. The leader A writes it to the log but there is a crash of the entire ensemble and B,C never write the change 74 to their log.

Step 3
B,C restart, A is still down
B,C form the quorum
B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
epoch is now 8, zxid is 80
Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog is 81

Step 4
A starts up. It applies the change in request with zxid 74 to its in-memory data tree
A contacts B to registerAsFollower and provides 74 as its ZxId
Since 71<=74<=81, B decides to send A the diff. B will send to A the proposal 81.


Problem:
The problem with the above sequence is that A's data tree has the update from request 74, which is not correct. Before getting the proposals 81, A should have received a trunc to 73. I don't see that in the code. If the maxCommitLog on B hadn't bumped to 81 but had stayed at 73, that case seems to be fine.
0% 0% 1814400 1814400 3972 No Perforce job exists for this issue. 4 32689
8 years, 29 weeks, 5 days ago 0|i05yi7:
ZooKeeper ZOOKEEPER-1153

Deprecate AuthFLE and LE

Improvement Closed Major Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 15/Aug/11 05:33   23/Nov/11 14:22 30/Aug/11 02:29 3.3.3 3.4.0     0 0   I propose we mark these as deprecated in 3.4.0 and remove them in the following release. 3973 No Perforce job exists for this issue. 2 33326
8 years, 30 weeks, 2 days ago
Reviewed
0|i062fr:
ZooKeeper ZOOKEEPER-1152

Exceptions thrown from handleAuthentication can cause buffer corruption issues in NIOServer

Bug Closed Major Fixed Camille Fournier Camille Fournier Camille Fournier 12/Aug/11 16:27   23/Nov/11 14:22 20/Aug/11 21:05 3.3.3, 3.4.0 3.4.0 server   0 1   Exceptions thrown by an AuthenticationProvider's handleAuthentication method will not be caught, and can cause the buffers in the NIOServer to not read requests fully or properly. Any exceptions thrown here should be caught and treated as auth failure. 3974 No Perforce job exists for this issue. 1 32690
8 years, 31 weeks, 3 days ago
Reviewed
0|i05yif:
ZooKeeper ZOOKEEPER-1151

http://zookeeper.apache.org/doc/trunk/api/ missing

Improvement Open Trivial Unresolved Unassigned Eugene Joseph Koontz Eugene Joseph Koontz 10/Aug/11 13:57   10/Aug/11 13:57   3.4.0   documentation   0 0   I see in http://zookeeper.apache.org/doc/ that we have http://zookeeper.apache.org/doc/trunk/, but http://zookeeper.apache.org/doc/trunk/api/ is a 404. I can generate the docs locally, but it would be useful to be able to be able to have URLs to reference the trunk API (e.g. for discussing new features in the JIRA). 2402 No Perforce job exists for this issue. 0 42045
8 years, 33 weeks, 1 day ago 0|i07k8f:
ZooKeeper ZOOKEEPER-1150

ZOOKEEPER-1027 fix for this patch to compile on windows...

Sub-task Closed Blocker Fixed Dheeraj Agrawal Dheeraj Agrawal Dheeraj Agrawal 10/Aug/11 11:58   23/Nov/11 14:22 14/Aug/11 13:48 3.3.3 3.4.0 c client   0 3   fix for this patch to compile on windows... 3975 No Perforce job exists for this issue. 1 33327
8 years, 32 weeks, 3 days ago
Reviewed
0|i062fz:
ZooKeeper ZOOKEEPER-1149

users cannot migrate from 3.4->3.3->3.4 server code against a single datadir

Task Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 04/Aug/11 18:15   11/Oct/13 17:49 24/Oct/11 02:49 3.4.0, 3.5.0 3.4.0, 3.5.0 server   0 0   3.4 is checking acceptedEpoch/currentEpoch files against the snap/log files in datadir. These files are new in 3.4. If they don't exist the server will create them, however if they do exist the server will validate them.

As a result if a user
1) upgrades from 3.3 to 3.4 this is fine
2) downgrades from 3.4 to 3.3 this is also fine (3.3 ignores these files)
3) however, 3.4->3.3->3.4 fails because 3.4 will see invalid *Epoch files in the datadir (as 3.3 would have ignored them, applying changes to snap/log w/o updating them)

163 No Perforce job exists for this issue. 0 33328
6 years, 23 weeks, 6 days ago The ZooKeeper server cannot be migrated from version 3.4 to version 3.3 and then back to version 3.4 without user intervention.

Upgrading from 3.3 to 3.4 is supported as is downgrading from 3.4 to 3.3. However moving from 3.4 to 3.3 and back to 3.4 will fail. 3.4 is checking the datadir for "acceptedEpoch" and "currentEpoch" files and comparing these against the snapshot and log files contained in the same directory. These epoch files are new in 3.4.

As a result:
1) upgrading from 3.3 to 3.4 is fine - the files don't exist, the server creates them
2) downgrading from 3.4 to 3.3 - this is also fine as version 3.3 ignores these files
3) however, 3.4->3.3->3.4 fails because 3.4 will see invalid *Epoch files in the datadir (as 3.3 would have ignored them, applying changes to snap/log w/o updating them)

A workaround for this problem is to delete the epoch files if this situation occurrs - the version 3.4 server will create them similar to case 1) above.
Incompatible change
0|i062g7:
ZooKeeper ZOOKEEPER-1148

Multi-threaded handling of reads

Improvement Open Major Unresolved Unassigned Vishal Kathuria Vishal Kathuria 04/Aug/11 15:52   14/Dec/19 06:07     3.7.0 server   0 1   This improvement is to take advantage of multiple cores in the machines that typically run ZooKeeper servers to get higher read throughput.

The challenge with multiple threads is read/write ordering guarantees that ZooKeeper provides.

One way of handling these is to let readOnly clients use the multiple threads, and the read/write clients continue to use the same single CommitProcessor thread for both reads and writes. For this to work, a client would have to declare its readOnly intent through a flag at connect time. (We already have a readOnly flag, but its intent is a bit different).

Another way of honoring the read/write guarantee is to let all sessions start as readOnly sessions and have them use the multi-threaded reads until they do their first write. Once a session performs a write, it automatically flips from a read/write session to a read only session and starts using the single threaded CommitProcessor. This is a little tricky as one has to worry about in flight reads when the write comes and we have to make sure those reads finish before the write goes through.

I would like to get the community's feedback on whether it would be useful to have this and whether an automatic discovery of readOnly or read/write intent is critical for this to be useful. For us, the clients know at connect time whether they will ever do a write or not, so an automatic detection is of limited use.


scaling 2403 No Perforce job exists for this issue. 1 42046
8 years, 31 weeks, 6 days ago 0|i07k8n:
ZooKeeper ZOOKEEPER-1147

Add support for local sessions

Improvement Resolved Major Fixed Thawan Kooburat Vishal Kathuria Vishal Kathuria 04/Aug/11 15:06   22/May/19 17:52 09/Oct/13 17:19 3.3.3 3.5.0 server   3 17 3024000 3024000 0% This improvement is in the bucket of making ZooKeeper work at a large scale. We are planning on having about a 1 million clients connect to a ZooKeeper ensemble through a set of 50-100 observers. Majority of these clients are read only - ie they do not do any updates or create ephemeral nodes.

In ZooKeeper today, the client creates a session and the session creation is handled like any other update. In the above use case, the session create/drop workload can easily overwhelm an ensemble. The following is a proposal for a "local session", to support a larger number of connections.

1. The idea is to introduce a new type of session - "local" session. A "local" session doesn't have a full functionality of a normal session.
2. Local sessions cannot create ephemeral nodes.
3. Once a local session is lost, you cannot re-establish it using the session-id/password. The session and its watches are gone for good.
4. When a local session connects, the session info is only maintained on the zookeeper server (in this case, an observer) that it is connected to. The leader is not aware of the creation of such a session and there is no state written to disk.
5. The pings and expiration is handled by the server that the session is connected to.

With the above changes, we can make ZooKeeper scale to a much larger number of clients without making the core ensemble a bottleneck.

In terms of API, there are two options that are being considered
1. Let the client specify at the connect time which kind of session do they want.
2. All sessions connect as local sessions and automatically get promoted to global sessions when they do an operation that requires a global session (e.g. creating an ephemeral node)

Chubby took the approach of lazily promoting all sessions to global, but I don't think that would work in our case, where we want to keep sessions which never create ephemeral nodes as always local. Option 2 would make it more broadly usable but option 1 would be easier to implement.

We are thinking of implementing option 1 as the first cut. There would be a client flag, IsLocalSession (much like the current readOnly flag) that would be used to determine whether to create a local session or a global session.


0% 0% 3024000 3024000 api-change, scaling 2404 No Perforce job exists for this issue. 9 42047
43 weeks, 1 day ago 0|i07k8v:
ZooKeeper ZOOKEEPER-1146

significant regression in client (c/python) performance

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 04/Aug/11 13:21   23/Nov/11 14:22 14/Aug/11 13:30 3.4.0 3.4.0 c client   0 2   I tried running my latency tester against trunk, in so doing I noticed that the C/Python (not sure which yet) client performance has seriously degraded since 3.3.3.

The first run (below) is with released 3.3.3 client against a 3 server ensemble running released 3.3.3 server code. The second run is the exact same environment (same ensemble), however using trunk c/zkpython client.

Notice:

1) in the first run operations are approx 10ms/write, 0.25ms/read - which is pretty much what's expected.

2) however in the second run we are seeing 50ms/operation regardless of read or write.

{noformat}
[phunt@c0309 zk-smoketest-3.3.3]$ PYTHONPATH=lib.linux-x86_64-2.6/ LD_LIBRARY_PATH=lib.linux-x86_64-2.6/ python26 ./zk-latencies.py --servers "c0309:2181,c0310:2181,c0311:2181" --znode_size=100 --znode_count=100 --timeout=5000 --synchronous
Connecting to c0309:2181
Connected in 16 ms, handle is 0
Connecting to c0310:2181
Connected in 16 ms, handle is 1
Connecting to c0311:2181
Connected in 15 ms, handle is 2
Testing latencies on server c0309:2181 using syncronous calls
created 100 permanent znodes in 959 ms (9.599378 ms/op 104.173415/sec)
set 100 znodes in 933 ms (9.332101 ms/op 107.157002/sec)
get 100 znodes in 27 ms (0.270889 ms/op 3691.551589/sec)
deleted 100 permanent znodes in 881 ms (8.812950 ms/op 113.469388/sec)
created 100 ephemeral znodes in 956 ms (9.564152 ms/op 104.557103/sec)
watched 100 znodes in 26 ms (0.264361 ms/op 3782.707587/sec)
deleted 100 ephemeral znodes in 881 ms (8.819292 ms/op 113.387792/sec)
notif 100 watches in 999 ms (9.994299 ms/op 100.057038/sec)
Testing latencies on server c0310:2181 using syncronous calls
created 100 permanent znodes in 964 ms (9.640460 ms/op 103.729490/sec)
set 100 znodes in 933 ms (9.332800 ms/op 107.148981/sec)
get 100 znodes in 29 ms (0.299308 ms/op 3341.036650/sec)
deleted 100 permanent znodes in 886 ms (8.864651 ms/op 112.807603/sec)
created 100 ephemeral znodes in 958 ms (9.585140 ms/op 104.328161/sec)
watched 100 znodes in 30 ms (0.300801 ms/op 3324.459240/sec)
deleted 100 ephemeral znodes in 886 ms (8.865030 ms/op 112.802779/sec)
notif 100 watches in 1000 ms (10.000212 ms/op 99.997878/sec)
Testing latencies on server c0311:2181 using syncronous calls
created 100 permanent znodes in 958 ms (9.582071 ms/op 104.361569/sec)
set 100 znodes in 935 ms (9.359350 ms/op 106.845024/sec)
get 100 znodes in 25 ms (0.252700 ms/op 3957.263893/sec)
deleted 100 permanent znodes in 891 ms (8.913291 ms/op 112.192013/sec)
created 100 ephemeral znodes in 958 ms (9.584489 ms/op 104.335246/sec)
watched 100 znodes in 25 ms (0.251091 ms/op 3982.627356/sec)
deleted 100 ephemeral znodes in 891 ms (8.915379 ms/op 112.165730/sec)
notif 100 watches in 1000 ms (10.000508 ms/op 99.994922/sec)
Latency test complete
[phunt@c0309 zk-smoketest-3.3.3]$ cd ../zk-smoketest-trunk/
[phunt@c0309 zk-smoketest-trunk]$ PYTHONPATH=lib.linux-x86_64-2.6/ LD_LIBRARY_PATH=lib.linux-x86_64-2.6/ python26 ./zk-latencies.py --servers "c0309:2181,c0310:2181,c0311:2181" --znode_size=100 --znode_count=100 --timeout=5000 --synchronous
Connecting to c0309:2181
Connected in 31 ms, handle is 0
Connecting to c0310:2181
Connected in 16 ms, handle is 1
Connecting to c0311:2181
Connected in 16 ms, handle is 2
Testing latencies on server c0309:2181 using syncronous calls
created 100 permanent znodes in 5099 ms (50.999281 ms/op 19.608119/sec)
set 100 znodes in 5066 ms (50.665429 ms/op 19.737324/sec)
get 100 znodes in 4009 ms (40.093150 ms/op 24.941916/sec)
deleted 100 permanent znodes in 5040 ms (50.404449 ms/op 19.839519/sec)
created 100 ephemeral znodes in 5124 ms (51.249170 ms/op 19.512511/sec)
watched 100 znodes in 4051 ms (40.514441 ms/op 24.682557/sec)
deleted 100 ephemeral znodes in 5048 ms (50.484939 ms/op 19.807888/sec)
notif 100 watches in 1000 ms (10.004182 ms/op 99.958199/sec)
Testing latencies on server c0310:2181 using syncronous calls
created 100 permanent znodes in 5115 ms (51.157510 ms/op 19.547472/sec)
set 100 znodes in 5056 ms (50.568910 ms/op 19.774996/sec)
get 100 znodes in 4099 ms (40.999382 ms/op 24.390612/sec)
deleted 100 permanent znodes in 5041 ms (50.418010 ms/op 19.834182/sec)
created 100 ephemeral znodes in 5083 ms (50.835850 ms/op 19.671157/sec)
watched 100 znodes in 4100 ms (41.003261 ms/op 24.388304/sec)
deleted 100 ephemeral znodes in 5058 ms (50.581930 ms/op 19.769906/sec)
notif 100 watches in 1000 ms (10.005081 ms/op 99.949219/sec)
Testing latencies on server c0311:2181 using syncronous calls
created 100 permanent znodes in 5099 ms (50.992720 ms/op 19.610642/sec)
set 100 znodes in 5091 ms (50.916569 ms/op 19.639972/sec)
get 100 znodes in 4099 ms (40.996401 ms/op 24.392385/sec)
deleted 100 permanent znodes in 5066 ms (50.669601 ms/op 19.735699/sec)
created 100 ephemeral znodes in 5124 ms (51.249208 ms/op 19.512496/sec)
watched 100 znodes in 4099 ms (40.999141 ms/op 24.390755/sec)
deleted 100 ephemeral znodes in 5049 ms (50.498819 ms/op 19.802443/sec)
notif 100 watches in 999 ms (9.997852 ms/op 100.021486/sec)
Latency test complete
{noformat}
3976 No Perforce job exists for this issue. 1 32691
8 years, 29 weeks ago
Reviewed
0|i05yin:
ZooKeeper ZOOKEEPER-1145

ObserverTest.testObserver fails at particular point after several runs of ant junt.run -Dtestcase=ObserverTest

Bug Closed Blocker Duplicate Vishal Kher Eugene Joseph Koontz Eugene Joseph Koontz 03/Aug/11 18:36   23/Nov/11 14:22 14/Aug/11 21:02 3.4.0 3.4.0     0 0   Use the attached repeat.sh to run ObserverTest repeatedly by doing:

src/repeat.sh ObserverTest

The test will will fail eventually after a few iterations; should be only a few minutes.

The line that fails in the test is:

zk = new ZooKeeper("127.0.0.1:" + CLIENT_PORT_OBS,
ClientBase.CONNECTION_TIMEOUT, this);

Attached as out.txt is the output showing a successful run, for comparison, followed by a failed run.


Note that in the seconds before the test fails, in the following lines, that there is a 24 second gap in time (between 22:13:02 and 22:13:26):

bq.
[junit] 2011-08-03 22:13:02,167 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11229:ZooKeeperServer@833] - Client attempting to establish new session at /127.0.0.1:46929
[junit] 2011-08-03 22:13:26,003 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11228:Leader@419] - Shutting down
[junit] 2011-08-03 22:13:26,003 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11228:Leader@425] - Shutdown called
[junit] java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1


3977 No Perforce job exists for this issue. 2 32692
8 years, 32 weeks, 3 days ago 0|i05yiv:
ZooKeeper ZOOKEEPER-1144

ZooKeeperServer not starting on leader due to a race condition

Bug Closed Blocker Fixed Vishal Kher Vishal Kher Vishal Kher 03/Aug/11 18:35   23/Nov/11 14:22 11/Aug/11 14:10 3.4.0 3.4.0     0 1   I have found one problem that is causing QuorumPeerMainTest:testQuorum to fail. This test uses 2 ZK servers.

The test is failing because leader is not starting ZooKeeperServer after leader election. so everything halts.

With the new changes, the server is now started in Leader.processAck() which is called from LeaderHandler. processAck() starts ZooKeeperServer if majority have acked NEWLEADER. The leader puts its ack in the the ackSet in Leader.lead(). Since processAck() is called from LearnerHandler it can happen that the learner's ack is processed before the leader is able to put its ack in the ackSet. When LearnerHandler invokes processAck(), the ackSet for newLeaderProposal will not have quorum (in this case 2). As a result, the ZooKeeperServer is never started on the Leader.

The leader needs to ensure that its ack is put in ackSet before starting LearnerCnxAcceptor or invoke processAck() itself after adding to ackSet. I haven't had time to go through the ZAB2 changes so I am not too familiar with the code. Can Ben/Flavio fix this?
3978 No Perforce job exists for this issue. 1 32693
8 years, 33 weeks ago 0|i05yj3:
ZooKeeper ZOOKEEPER-1143

quorum send & recv workers are missing thread names

Improvement Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 03/Aug/11 17:27   23/Nov/11 14:22 14/Aug/11 20:44   3.4.0 server   0 0   Simplifies debugging. 3979 No Perforce job exists for this issue. 1 33329
8 years, 32 weeks, 3 days ago
Reviewed
0|i062gf:
ZooKeeper ZOOKEEPER-1142

incorrect stat output

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 02/Aug/11 20:29   23/Nov/11 14:22 11/Aug/11 02:00 3.4.0 3.4.0 server   0 1   stat output seems to be missing some end of line:

{noformat}
echo stat |nc c0309 2181
Zookeeper version: 3.4.0--1, built on 08/02/2011 22:25 GMT
Clients:
/172.29.81.91:33378[0](queued=0,recved=1,sent=0
Latency min/avg/max: 0/28/252
Received: 246844
Sent: 266737
Outstanding: 0
Zxid: 0x4000508c2
Mode: follower
Node count: 4
{noformat}

Multiple clients end up on the same line (missing newline)
3980 No Perforce job exists for this issue. 1 32694
8 years, 33 weeks ago
Reviewed
0|i05yjb:
ZooKeeper ZOOKEEPER-1141

zkpython fails tests under python 2.4

Bug Closed Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 02/Aug/11 19:31   23/Nov/11 14:22 14/Aug/11 20:41 3.4.0 3.4.0 contrib-bindings   0 1   "ant test" under python 2.4 is failing due to a small issue in the test code - using a new feature introduced in 2.5.

I have a small patch which addresses this, after which I was able to compile and run the tests successfully under python 2.4.
3981 No Perforce job exists for this issue. 1 32695
8 years, 32 weeks, 3 days ago
Reviewed
python 0|i05yjj:
ZooKeeper ZOOKEEPER-1140

server shutdown is not stopping threads

Bug Closed Blocker Fixed Laxman Patrick D. Hunt Patrick D. Hunt 29/Jul/11 12:49   23/Nov/11 14:22 30/Aug/11 02:37 3.4.0 3.4.0 server, tests   0 3   Near the end of QuorumZxidSyncTest there are tons of threads running - 115 "ProcessThread" threads, similar numbers of SessionTracker.

Also I see ~100 ReadOnlyRequestProcessor - why is this running as a separate thread? (henry/flavio?)

3982 No Perforce job exists for this issue. 1 32696
8 years, 30 weeks, 2 days ago
Reviewed
0|i05yjr:
ZooKeeper ZOOKEEPER-1139

jenkins is reporting two warnings, fix these

Bug Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 27/Jul/11 17:42   23/Nov/11 14:22 28/Jul/11 19:23 3.4.0 3.4.0     0 1   cleanup jenkins report, currently 2 compiler warnings being reported.
3983 No Perforce job exists for this issue. 1 32697
8 years, 33 weeks, 1 day ago
Reviewed
0|i05yjz:
ZooKeeper ZOOKEEPER-1138

release audit failing for a number of new files

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 27/Jul/11 16:04   23/Nov/11 14:22 28/Jul/11 18:06 3.4.0 3.4.0     0 1   I'm seeing a number of problems in the release audit output for 3.4.0, these must be fixed before 3.4.0 release:

{noformat}
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultConnectionSettings.cfg
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultNodeVeiwers.cfg
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/licences/epl-v10.html
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/Cli.vcproj
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winconfig.h
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winstdint.h
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.sln
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.vcproj
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/help/index.html
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/js/package.yml
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/log4j.properties
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/date.format.js
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.bar.js
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.dot.js
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.line.js
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.pie.js
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.raphael.js
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/raphael.js
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/yui-min.js
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/monitoring/JMX-RESOURCES
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/config/defaultConnectionSettings.cfg
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/config/defaultNodeVeiwers.cfg
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/lib/log4j.properties
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/licences/epl-v10.html
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/java/test/org/apache/zookeeper/MultiTransactionRecordTest.java
[rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/java/test/org/apache/zookeeper/server/quorum/LearnerTest.java
Lines that start with ????? in the release audit report indicate files that do not have an Apache license header.
{noformat}
3984 No Perforce job exists for this issue. 1 32698
8 years, 33 weeks, 1 day ago
Reviewed
0|i05yk7:
ZooKeeper ZOOKEEPER-1137

AuthFLE is throwing NPE when servers are configured with different election ports.

Bug Open Critical Unresolved Unassigned Laxman Laxman 27/Jul/11 09:02   20/Jun/12 18:27   3.3.3   leaderElection   0 1 86400 86400 0% AuthFLE is throwing NPE when servers are configured with different election ports.

*Configuration*
{noformat}
server.1 = 10.18.52.25:2888:3888
server.2 = 10.18.52.205:2889:3889
server.3 = 10.18.52.144:2899:3890
{noformat}

*Logs*
{noformat}
2011-07-22 16:06:22,404 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:65170:AuthFastLeaderElection@844] - Election tally
2011-07-22 16:06:29,483 - ERROR [WorkerSender Thread: 6:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 6,5,main] died
java.lang.NullPointerException
at org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.process(AuthFastLeaderElection.java:488)
at org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.run(AuthFastLeaderElection.java:432)
at java.lang.Thread.run(Thread.java:619)
2011-07-22 16:06:29,583 - ERROR [WorkerSender Thread: 1:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 1,5,main] died
java.lang.NullPointerException
{noformat}

0% 0% 86400 86400 31 No Perforce job exists for this issue. 3 32699
8 years, 21 weeks, 6 days ago Leader Election 0|i05ykf:
ZooKeeper ZOOKEEPER-1136

NEW_LEADER should be queued not sent to match the Zab 1.0 protocol on the twiki

Bug Closed Blocker Fixed Benjamin Reed Benjamin Reed Benjamin Reed 26/Jul/11 12:58   23/Nov/11 14:22 14/Sep/11 02:59   3.4.0     0 2   the NEW_LEADER message was sent at the beginning of the sync phase in Zab pre1.0, but it must be at the end in Zab 1.0. if the protocol is 1.0 or greater we need to queue rather than send the packet. 3985 No Perforce job exists for this issue. 3 32700
8 years, 21 weeks, 2 days ago
Reviewed
0|i05ykn:
ZooKeeper ZOOKEEPER-1135

clarify usage of clientPortAddress zoo.cfg option

Improvement Open Major Unresolved Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 22/Jul/11 19:06   22/Jul/11 19:07       documentation   1 0   Documentation should clarify permitted usage of clientPortAddress:

Add something like:

"You must specify the port and the address separately like so:

clientPortAddress=my.hostname.com
clientPort=2181

(that is, you can't do "clientPortAddress=my.hostname.com:2181")"

2405 No Perforce job exists for this issue. 0 42048
8 years, 35 weeks, 6 days ago 0|i07k93:
ZooKeeper ZOOKEEPER-1134

ClientCnxnSocket string comparison using == rather than equals

Bug Closed Critical Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 22/Jul/11 18:18   23/Nov/11 14:22 25/Jul/11 17:32 3.4.0 3.4.0 server   0 1   Noticed string comparison using == rather than equals. 3986 No Perforce job exists for this issue. 1 32701
8 years, 35 weeks, 2 days ago
Reviewed
0|i05ykv:
ZooKeeper ZOOKEEPER-1133

ZOOKEEPER-635 allow for "clientPortAddress=host:port"

Sub-task Open Minor Unresolved Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 22/Jul/11 14:57   14/Dec/19 06:08     3.7.0 server   0 0   2407 No Perforce job exists for this issue. 1 42049
8 years, 35 weeks, 1 day ago Currently ZOOKEEPER-635 allows:

clientPortAddress=my.host.name
clientPort=1234

This patch lets you combine this into a single configuration line:

clientPortAddress=my.host.name:1234

0|i07k9b:
ZooKeeper ZOOKEEPER-1132

ZooKeeper FAQ is out of date wrt testing SessionExpiredException

Bug Open Major Unresolved Unassigned Will Johnson Will Johnson 22/Jul/11 08:56   22/Jul/11 08:56   3.3.3   documentation   1 1   See http://markmail.org/thread/vyipodh6ar2b77a3

In addition, this other thread was mentioned as the culprit: http://markmail.org/thread/z5bt4o3quqil7r7t

There still seems to be no way to programmatically test SessionExipredExceptions based on these threads. I'm not sure if that warrants a separate ticket or not.
documentation, test 2408 No Perforce job exists for this issue. 0 32702
8 years, 35 weeks, 6 days ago 0|i05yl3:
ZooKeeper ZOOKEEPER-1131

Transactions can be dropped because leader election uses last committed zxid instead of last acknowledged/received zxid

Bug Resolved Major Not A Problem Unassigned Alexander Shraer Alexander Shraer 21/Jul/11 15:16   27/Jul/11 14:59 25/Jul/11 18:28 3.4.0   leaderElection, server   0 1   Suppose we have 3 servers - A, B, C which have seen the same number of commits.
- A is the leader and it sends out a new proposal.
- B doesn't receive the proposal, but A and C receive and ACK it
- A commits the proposal, but fails before anyone else sees the commit.
- B and C start leader election.
- since both B and C saw the same number of commits, if B has a higher server-id than C, leader election will elect B. Then, the last transaction will be truncated from C's log, which is a bug since it was acked by a majority.

This happens since servers propose their last committed zxid in leader election, and not their last received / acked zxid (this is not being tracked, AFAIK). See method
FastLeaderElection.getInitLastLoggedZxid(), which calls QuorumPeer.getLastLoggedZxid(), which is supposed to return the last logged Zxid, but instead calls zkDb.getDataTreeLastProcessedZxid() which returns the last committed zxid.
3987 No Perforce job exists for this issue. 0 32703
8 years, 35 weeks, 1 day ago 0|i05ylb:
ZooKeeper ZOOKEEPER-1130

Java port of PHunt's zk-smoketest

New Feature Open Major Unresolved Colin Goodheart-Smithe Colin Goodheart-Smithe Colin Goodheart-Smithe 21/Jul/11 08:25   14/Dec/19 06:08   3.4.0 3.7.0 contrib   0 0   I have ported Patrick's zookeeper smoke test to Java so that it can be run on windows machines (since I couldn't find any way of getting the python bindings for windows). The port provides the same functionality as the python varient as of 21st June 2011. 32 No Perforce job exists for this issue. 4 42050
8 years, 12 weeks, 3 days ago 0|i07k9j:
ZooKeeper ZOOKEEPER-1129

Add RPM/Debian packages to Jenkins

Task Resolved Major Won't Fix Unassigned Eric Yang Eric Yang 20/Jul/11 14:41   03/Mar/16 11:19 03/Mar/16 11:19         0 0   For taking advantage of packages generated by ZOOKEEPER-999. It would be nice to setup rpm/debian package build on Jenkins. 2409 No Perforce job exists for this issue. 0 42051
4 years, 3 weeks ago 0|i07k9r:
ZooKeeper ZOOKEEPER-1128

Recipe wrong for Lock process.

Bug Resolved Major Fixed yynil yynil yynil 19/Jul/11 12:49   02/Mar/16 20:35 27/Jul/11 21:21 3.3.3   recipes   0 1   http://zookeeper.apache.org/doc/trunk/recipes.html
The current recipe for Lock has the wrong process.
Specifically, for the
"4. The client calls exists( ) with the watch flag set on the path in the lock directory with the next lowest sequence number."
It shouldn't be the "the next lowest sequence number". It should be the "current lowest path".

If you're gonna use "the next lowest sequence number", you'll never wait for the lock possession.

The following is the test code:

{code:title=LockTest.java|borderStyle=solid}
ACL acl = new ACL(Perms.ALL, new Id("10.0.0.0/8", "1"));
List<ACL> acls = new ArrayList<ACL>();
acls.add(acl);
String connectStr = "localhost:2181";
final Semaphore sem = new Semaphore(0);
ZooKeeper zooKeeper = new ZooKeeper(connectStr, 1000 * 30, new Watcher() {

@Override
public void process(WatchedEvent event) {
System.out.println("eventType:" + event.getType());
System.out.println("keeperState:" + event.getState());
if (event.getType() == Event.EventType.None) {
if (event.getState() == Event.KeeperState.SyncConnected) {
sem.release();
}
}
}
});
System.out.println("state:" + zooKeeper.getState());
System.out.println("Waiting for the state to be connected");
try {
sem.acquire();
} catch (InterruptedException ex) {
ex.printStackTrace();
}
System.out.println("Now state:" + zooKeeper.getState());

String directory = "/_locknode_";
Stat stat = zooKeeper.exists(directory, false);
if (stat == null) {
zooKeeper.create(directory, new byte[]{}, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
}
String prefix = directory + "/lock-";
String path = zooKeeper.create(prefix, new byte[]{}, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
System.out.println("Create the path for " + path);
while (true) {
List<String> children = zooKeeper.getChildren(directory, false);
Collections.sort(children);
System.out.println("The whole lock size is " + children.size());
String lowestPath = children.get(0);
DecimalFormat df = new DecimalFormat("0000000000");
String currentSuffix = lowestPath.substring("lock-".length());
System.out.println("CurrentSuffix is " + currentSuffix);
int intIndex = Integer.parseInt(currentSuffix);

if (path.equals(directory + "/" + lowestPath)) {
//I've got the lock and release it
System.out.println("I've got the lock at " + new Date());
System.out.println("next index is " + intIndex);
Thread.sleep(10000);
System.out.println("After sleep 3 seconds, I'm gonna release the lock");
zooKeeper.delete(path, -1);
break;
}
final Semaphore wakeupSem = new Semaphore(0);
stat = zooKeeper.exists(directory + "/" + lowestPath, new Watcher() {

@Override
public void process(WatchedEvent event) {
System.out.println("Event is " + event.getType());
System.out.println("State is " + event.getState());
if (event.getType() == Event.EventType.NodeDeleted) {
wakeupSem.release();
}
}
});
if (stat != null) {
System.out.println("Waiting for the delete of ");
wakeupSem.acquire();
} else {
System.out.println("Continue to seek");
}
}
{code}
3988 No Perforce job exists for this issue. 1 32704
8 years, 35 weeks ago 0|i05ylj:
ZooKeeper ZOOKEEPER-1127

Auth completion are called for every registered auth, and auths are never removed from the auth list. (even after they are processed).

Bug Open Critical Unresolved Unassigned Dheeraj Agrawal Dheeraj Agrawal 18/Jul/11 14:50   26/Dec/12 02:10   3.3.3   c client   0 2   When we get a auth response, every time we process any auth_response, we call ALL the auth completions (might be registered by different add_auth_info calls). we should be calling only the one that the request came from? I guess we dont know for which request the response corresponds to? If the requests are processed in FIFO and response are got in order then may be we can figure out which add_auth info request the response corresponds to.

Also , we never remove entries from the auth_list

Also the logging is misleading.
<code>
1206 if (rc) {
1207 LOG_ERROR(("Authentication scheme %s failed. Connection closed.",
1208 zh->auth_h.auth->scheme));
1209 }
1210 else {
1211 LOG_INFO(("Authentication scheme %s succeeded", zh->auth_h.auth->scheme));
</code>
If there are multiple auth_info in the auth_list , we always print success/failure for ONLY the first one. So if I had two auths for scehmes, ABCD and EFGH and my auth scheme EFGH failed, the logs will still say ABCD failed
2410 No Perforce job exists for this issue. 0 32705
7 years, 13 weeks, 1 day ago 0|i05ylr:
ZooKeeper ZOOKEEPER-1126

state of zk_handle should NOT be initialized to 0 (CLOSED) in zokeeper_init. It should have a not initialized state.

Bug Resolved Major Duplicate Dheeraj Agrawal Dheeraj Agrawal Dheeraj Agrawal 18/Jul/11 14:47   18/Jul/11 17:19 18/Jul/11 17:19 3.3.3   c client   0 2   In zoo_add_auth, we have following check.
2954 // [ZOOKEEPER-800] zoo_add_auth should return ZINVALIDSTATE if
2955 // the connection is closed.
2956 if (zoo_state(zh) == 0) {
2957 return ZINVALIDSTATE;

when we do zookeeper_init, the state is initialized to 0 and above we check if state = 0 then throw exception.
There is a race condition where the doIo thread is slow and has not changed the state to CONNECTING, then you end up returning back ZKINVALIDSTATE from zoo_add_auth.
The problem is we use 0 for CLOSED state and UNINITIALIZED state. in case of uninitialized case it should let it go through.
Is this intentional? In java we have the uninitialized state = null.
If not we can initialize it to some other magic number.
3989 No Perforce job exists for this issue. 0 32706
8 years, 36 weeks, 3 days ago 0|i05ylz:
ZooKeeper ZOOKEEPER-1125

Intermittent java core test failures

Bug Resolved Major Not A Problem Vishal Kher Vishal Kher Vishal Kher 13/Jul/11 16:33   15/May/14 18:53 15/May/14 18:53   3.5.0 tests   2 3   Some of the tests are consistently failing for me and intermittently on hudson.

Posting discussion from mailing list below.

Vishal,
Can you please open a jira for this and mark it as a blocker for 3.4
release? Looks like its transient:

https://builds.apache.org/job/ZooKeeper-trunk/

The latest build is passing.

thanks
mahadev
- Hide quoted text -

On Mon, Jul 11, 2011 at 12:49 PM, Vishal Kher <vishalmlst@gmail.com> wrote:
> Hi,
>
> ant test-core-java is consistently failing for me.
>
> The error seems to be either:
>
> Testcase: testFollowersStartAfterLeader took 35.577 sec
> Caused an ERROR
> Did not connect
> java.util.concurrent.TimeoutException: Did not connect
> at
> org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:124)
> at
> org.apache.zookeeper.test.QuorumTest.testFollowersStartAfterLeader(QuorumTest.java:308)
> at
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>
> or
>
> Testcase: testNoLogBeforeLeaderEstablishment took 8.831 sec
> Caused an ERROR
> KeeperErrorCode = ConnectionLoss for /blah
> org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /blah
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761)
> at
> org.apache.zookeeper.test.QuorumTest.testNoLogBeforeLeaderEstablishment(QuorumTest.java:385)
> at
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>
> Looks like the reason why the tests are failing for me is similar to why the
> tests failed on hudson:
>
> 2011-07-11 14:47:26,219 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379
> :Leader@425] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1
> at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:425)
> at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:400)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:729)
> 2011-07-11 14:47:26,220 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379
> :ZooKeeperServer@416] - shutting down
>
> The leader is not able to ping the followers. Has anyone seen this before?
>
> Thanks.
> -Vishal
>
> On Sun, Jul 10, 2011 at 6:52 AM, Apache Jenkins Server <
> jenkins@builds.apache.org> wrote:
>
>> See https://builds.apache.org/job/ZooKeeper-trunk/1239/
>>
>>
>> ###################################################################################
>> ########################## LAST 60 LINES OF THE CONSOLE
>> ###########################
>> [...truncated 242795 lines...]
>> [junit] 2011-07-10 10:57:16,673 [myid:] - INFO
>> [main:SessionTrackerImpl@206] - Shutting down
>> [junit] 2011-07-10 10:57:16,673 [myid:] - INFO
>> [main:PrepRequestProcessor@702] - Shutting down
>> [junit] 2011-07-10 10:57:16,674 [myid:] - INFO
>> [main:SyncRequestProcessor@170] - Shutting down
>> [junit] 2011-07-10 10:57:16,674 [myid:] - INFO
>> [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited!
>> [junit] 2011-07-10 10:57:16,675 [myid:] - INFO
>> [main:FinalRequestProcessor@423] - shutdown of request processor complete
>> [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [ProcessThread(sid:0
>> cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop!
>> [junit] 2011-07-10 10:57:16,676 [myid:] - INFO [main:ClientBase@227] -
>> connecting to 127.0.0.1 11221
>> [junit] ensureOnly:[]
>> [junit] 2011-07-10 10:57:16,677 [myid:] - INFO [main:ClientBase@428] -
>> STARTING server
>> [junit] 2011-07-10 10:57:16,678 [myid:] - INFO
>> [main:ZooKeeperServer@164] - Created server with tickTime 3000
>> minSessionTimeout 6000 maxSessionTimeout 60000 datadir
>> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2
>> snapdir
>> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2
>> [junit] 2011-07-10 10:57:16,679 [myid:] - INFO
>> [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221
>> [junit] 2011-07-10 10:57:16,680 [myid:] - INFO [main:FileSnap@83] -
>> Reading snapshot
>> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2/snapshot.b
>> [junit] 2011-07-10 10:57:16,683 [myid:] - INFO [main:FileTxnSnapLog@256]
>> - Snapshotting: b
>> [junit] 2011-07-10 10:57:16,684 [myid:] - INFO [main:ClientBase@227] -
>> connecting to 127.0.0.1 11221
>> [junit] 2011-07-10 10:57:16,685 [myid:] - INFO [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket
>> connection from /127.0.0.1:45122
>> [junit] 2011-07-10 10:57:16,686 [myid:] - INFO [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:11221:NIOServerCnxn@815] - Processing stat command from /
>> 127.0.0.1:45122
>> [junit] 2011-07-10 10:57:16,686 [myid:] - INFO
>> [Thread-5:NIOServerCnxn$StatCommand@652] - Stat command output
>> [junit] 2011-07-10 10:57:16,688 [myid:] - INFO
>> [Thread-5:NIOServerCnxn@995] - Closed socket connection for client /
>> 127.0.0.1:45122 (no session established for client)
>> [junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port]
>> [junit] expect:InMemoryDataTree
>> [junit] found:InMemoryDataTree
>> org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
>> [junit] expect:StandaloneServer_port
>> [junit] found:StandaloneServer_port
>> org.apache.ZooKeeperService:name0=StandaloneServer_port-1
>> [junit] 2011-07-10 10:57:16,690 [myid:] - INFO
>> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD
>> testQuota
>> [junit] 2011-07-10 10:57:16,690 [myid:] - INFO [main:ClientBase@465] -
>> tearDown starting
>> [junit] 2011-07-10 10:57:16,754 [myid:] - INFO [main:ZooKeeper@662] -
>> Session: 0x13113b1aca50000 closed
>> [junit] 2011-07-10 10:57:16,754 [myid:] - INFO
>> [main-EventThread:ClientCnxn$EventThread@495] - EventThread shut down
>> [junit] 2011-07-10 10:57:16,754 [myid:] - INFO [main:ClientBase@435] -
>> STOPPING server
>> [junit] 2011-07-10 10:57:16,755 [myid:] - INFO [NIOServerCxn.Factory:
>> 0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - NIOServerCnxn factory
>> exited run method
>> [junit] 2011-07-10 10:57:16,755 [myid:] - INFO
>> [main:ZooKeeperServer@416] - shutting down
>> [junit] 2011-07-10 10:57:16,756 [myid:] - INFO
>> [main:SessionTrackerImpl@206] - Shutting down
>> [junit] 2011-07-10 10:57:16,756 [myid:] - INFO
>> [main:PrepRequestProcessor@702] - Shutting down
>> [junit] 2011-07-10 10:57:16,757 [myid:] - INFO
>> [main:SyncRequestProcessor@170] - Shutting down
>> [junit] 2011-07-10 10:57:16,760 [myid:] - INFO [ProcessThread(sid:0
>> cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop!
>> [junit] 2011-07-10 10:57:16,762 [myid:] - INFO
>> [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited!
>> [junit] 2011-07-10 10:57:16,762 [myid:] - INFO
>> [main:FinalRequestProcessor@423] - shutdown of request processor complete
>> [junit] 2011-07-10 10:57:16,763 [myid:] - INFO [main:ClientBase@227] -
>> connecting to 127.0.0.1 11221
>> [junit] ensureOnly:[]
>> [junit] 2011-07-10 10:57:16,767 [myid:] - INFO [main:ClientBase@493] -
>> fdcount after test is: 35 at start it was 24
>> [junit] 2011-07-10 10:57:16,767 [myid:] - INFO [main:ClientBase@495] -
>> sleeping for 20 secs
>> [junit] 2011-07-10 10:57:16,768 [myid:] - INFO [main:ZKTestCase$1@60]
>> - SUCCEEDED testQuota
>> [junit] 2011-07-10 10:57:16,768 [myid:] - INFO [main:ZKTestCase$1@55]
>> - FINISHED testQuota
>> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.691 sec
>>
>> BUILD FAILED
>> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build.xml:959:
>> The following error occurred while executing this line:
>> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build.xml:870:
>> Tests failed!
>>
>> Total time: 19 minutes 0 seconds
>> [FINDBUGS] Skipping publisher since build result is FAILURE
>> [WARNINGS] Skipping publisher since build result is FAILURE
>> Recording fingerprints
>> Archiving artifacts
>> Recording test results
>> Publishing Javadoc
>> Publishing Clover coverage report...
>> No Clover report will be published due to a Build Failure
>> Email was triggered for: Failure
>> Sending email for trigger: Failure
>>
>>
>>
>>
>> ###################################################################################
>> ############################## FAILED TESTS (if any)
>> ##############################
>> 2 tests failed.
>> REGRESSION: org.apache.zookeeper.test.ObserverTest.testObserver
>>
>> Error Message:
>> KeeperErrorCode = ConnectionLoss for /obstest
>>
>> Stack Trace:
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /obstest
>> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761)
>> at
>> org.apache.zookeeper.test.ObserverTest.testObserver(ObserverTest.java:101)
>> at
>> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
>>
>>
>> REGRESSION: org.apache.zookeeper.test.ReadOnlyModeTest.testSeekForRwServer
>>
>> Error Message:
>> KeeperErrorCode = ConnectionLoss for /test
>>
>> Stack Trace:
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for /test
>> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>> at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761)
>> at
>> org.apache.zookeeper.test.ReadOnlyModeTest.testSeekForRwServer(ReadOnlyModeTest.java:213)
>> at
>> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
179 No Perforce job exists for this issue. 4 32707
5 years, 45 weeks ago 0|i05ym7:
ZooKeeper ZOOKEEPER-1124

Multiop submitted to non-leader always fails due to timeout

Bug Closed Critical Fixed Marshall McMullen Marshall McMullen Marshall McMullen 13/Jul/11 12:17   23/Nov/11 14:22 15/Jul/11 00:51 3.4.0 3.4.0 server   0 1   all The new Multiop support added under zookeeper-965 fails every single time if the multiop is submitted to a non-leader in quorum mode. In standalone mode it always works properly and this bug only presents itself in quorum mode (with 2 or more nodes). After 12 hours of debugging (*sigh*) it turns out to be a really simple fix. There are a couple of missing case statements inside FollowerRequestProcessor.java and ObserverRequestProcessor.java to ensure that multiop is forwarded to the leader for commit. I've attached a patch that fixes this problem.

It's probably worth nothing that zookeeper-965 has already been committed to trunk. But this is a fatal flaw that will prevent multiop support from working properly and as such needs to get committed to 3.4.0 as well. Is there a way to tie these two cases together in some way?
3990 No Perforce job exists for this issue. 1 32708
8 years, 36 weeks, 6 days ago
Reviewed
0|i05ymf:
ZooKeeper ZOOKEEPER-1123

Can't connect to ZooKeeper server with the C client library from Solaris: connect() call fails.

Bug Open Major Unresolved Unassigned Tadeusz Andrzej Kadłubowski Tadeusz Andrzej Kadłubowski 12/Jul/11 07:03   08/Feb/12 14:28   3.3.3   c client   0 1   Client: Solaris 5.10, x86 machine.
Server: Linux Fedora 14
I have a C app that runs on Solaris and connects to ZooKeeper which I run on Linux (just a single server instance, that's just a development setup).

Upon calling zookeeper_init() I get logs that say connect() call fails. TCP-wise the client sends RST packet instead of the third part of the three-way handshake. Traced client syscalls below.

Sometimes the client is able to establish a connection - after half an hour of trying, or even longer.

Logs
====

The client logs:

2011-07-11 16:20:22,954:13148(0xf):ZOO_ERROR@handle_socket_error_msg@1501: Socket [10.10.1.71:2181] zk retcode=-4, errno=0(Error 0): connect() call failed

The server logs:

2011-07-11 16:20:22,950 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /10.10.9.27:34017
2011-07-11 16:20:22,955 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
2011-07-11 16:20:22,955 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /10.10.9.27:34017 (no session established for client)

Syscalls in the client:

/15: 3516.6191 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT) = 11
/15: 3516.6192 setsockopt(11, tcp, TCP_NODELAY, 0xFD8A8ECC, 4, SOV_DEFAULT) = 0
/15: 3516.6193 fcntl(11, F_GETFL) = 2
/15: 3516.6194 fcntl(11, F_SETFL, FWRITE|FNONBLOCK) = 0
/15: 3516.6194 connect(11, 0x0813BA30, 16, SOV_DEFAULT) Err#150 EINPROGRESS
/15: 3516.6195 write(2, " 2 0 1 1 - 0 7 - 1 2 1".., 23) = 23
<<< SNIP writing log message >>>
/15: 3516.6204 write(2, "\n", 1) = 1
/15: 3516.6205 close(11) = 0


What does work:
===============

Using Java client on the same Solaris machine works without any problems. Connecting to the Linux server using C client library on Linux works OK (though I tested it within one box, via loopback interface).
2411 No Perforce job exists for this issue. 0 32709
8 years, 7 weeks, 1 day ago 0|i05ymn:
ZooKeeper ZOOKEEPER-1122

"start" and "stop" commands are not present in zkServer.cmd

Improvement Open Major Unresolved Unassigned Alexander Osadchiy Alexander Osadchiy 11/Jul/11 08:17   14/Dec/19 06:08   3.3.3 3.7.0 scripts   4 10   Windows Now ZooKeeper server can be started and stoped from Unix-based systems using script "bin/zkServer.sh":

bin/zkServer.sh start - to start server;
bin/zkServer.sh stop - to stop server.

There are no "start" and "stop" commands in script "zkServer.cmd" (for Windows).
patch 2412 No Perforce job exists for this issue. 2 42052
2 years, 39 weeks, 3 days ago 0|i07k9z:
ZooKeeper ZOOKEEPER-1121

Data cleanup / Eviction policy

Wish Open Major Unresolved Unassigned Matthew Matthew 08/Jul/11 15:13   08/Jul/11 15:13       server   0 1   We are using zookeeper to store versions of business objects in order to achieve coherence, distributed locks, etc. These business objects have limited lifespans (i.e. objects created over a week ago are rarely accessed), so effectively, after some time period, we do not need their information in zookeeper anymore. It would be nice to have a built-in tool or mechanism for expiring old data, much like how PurgeTxnLog cleans the snapshot and transaction log files.

Any thoughts on whether this can be supported or how it can be accomplished? Currently we are walking the tree and deleting nodes with an old mtime.
2413 No Perforce job exists for this issue. 0 42053
8 years, 37 weeks, 6 days ago eviction policy 0|i07ka7:
ZooKeeper ZOOKEEPER-1120

recipes haven't been built in distribution package

Task Resolved Major Duplicate Unassigned Yanming Zhou Yanming Zhou 07/Jul/11 20:49   14/Dec/12 21:09 05/Jan/12 12:22 3.3.3   build, recipes   0 1   I have download zookeeper-3.3.3.tar.gz,and have not found zookeeper-recipes.jar in dist-maven,so I try to build it myself

D:\packages\zookeeper-3.3.3\recipes\lock>ant
Buildfile: D:\packages\zookeeper-3.3.3\recipes\lock\build.xml

BUILD FAILED
D:\packages\zookeeper-3.3.3\recipes\lock\build.xml:19: Cannot find D:\packages\zookeeper-3.3.3\recipes\build-recipes.xml imported from D:\packages\zookeeper-3.3.3\recipes\lock\build.xml

Total time: 0 seconds



recipes/build-recipes.xml doesn't include in zookeeper-3.3.3.tar.gz
2414 No Perforce job exists for this issue. 1 33330
7 years, 14 weeks, 5 days ago 0|i062gn:
ZooKeeper ZOOKEEPER-1119

zkServer stop command incorrectly reading comment lines in zoo.cfg

Bug Closed Major Fixed Patrick D. Hunt Glen Mazza Glen Mazza 07/Jul/11 06:33   23/Nov/11 14:22 25/Jul/11 18:22 3.3.3 3.4.0 scripts   0 1   Ubuntu Linux 10.04, JDK 6 Hello, adding the following commented-out dataDir to the zoo.cfg file (keeping the default one provided active):

{noformat}
# the directory where the snapshot is stored.
# dataDir=test123/data
dataDir=/export/crawlspace/mahadev/zookeeper/server1/data
{noformat}

and then running sh zkServer.sh stop is showing that the program is incorrectly reading the commented-out dataDir:

{noformat}
gmazza@gmazza-work:~/dataExt3/apps/zookeeper-3.3.3/bin$ sh zkServer.sh stop
JMX enabled by default
Using config: /media/NewDriveExt3_/apps/zookeeper-3.3.3/bin/../conf/zoo.cfg
Stopping zookeeper ...
error: could not find file test123/data
/export/crawlspace/mahadev/zookeeper/server1/data/zookeeper_server.pid
gmazza@gmazza-work:~/dataExt3/apps/zookeeper-3.3.3/bin$
{noformat}

If I change the commented-out line in zoo.cfg to "test123456/data" and run the stop command again I get:
error: could not find file test123456/data

showing that it's incorrectly doing a run-time read of the commented-out lines. (Difficult to completely confirm, but this problem doesn't appear to occur with the start command, only the stop one.)
3991 No Perforce job exists for this issue. 1 32710
8 years, 35 weeks, 2 days ago
Reviewed
0|i05ymv:
ZooKeeper ZOOKEEPER-1118

Inconsistent data after server crashes several times

Bug Resolved Critical Duplicate Unassigned Kurt Young Kurt Young 05/Jul/11 22:56   06/Jul/11 21:11 06/Jul/11 09:32 3.3.2   quorum   0 0   Redhat RHEL5 I think there is a bug when Follower try to sync data with Leader.
Assume there are some operations committed during one server had been crashed. When the server restart, it will receive a NEWLEADER packet which include the last zxid of leader and the server will set its own lastProcessZxid to the leader's.
{code:title=Follower.java|borderStyle=solid}
void followLeader() throws InterruptedException {
fzk.registerJMX(new FollowerBean(this, zk), self.jmxLocalPeerBean);
try {
InetSocketAddress addr = findLeader();
try {
connectToLeader(addr);
long newLeaderZxid = registerWithLeader(Leader.FOLLOWERINFO); // get the last zxid from leader
//check to see if the leader zxid is lower than ours
//this should never happen but is just a safety check
long lastLoggedZxid = self.getLastLoggedZxid();
if ((newLeaderZxid >> 32L) < (lastLoggedZxid >> 32L)) {
LOG.fatal("Leader epoch " + Long.toHexString(newLeaderZxid >> 32L)
+ " is less than our epoch " + Long.toHexString(lastLoggedZxid >> 32L));
throw new IOException("Error: Epoch of leader is lower");
}
syncWithLeader(newLeaderZxid); // set its own lastProcessZxid to leader's last zxid
{code}

Then, some COMMIT packets will be received by the server in order to sync the data with leader. And then, the leader will send an UPTODATE packet to server to take a snapshot.
{code:title=Follower.java|borderStyle=solid}
protected void processPacket(QuorumPacket qp) throws IOException{
switch (qp.getType()) {
case Leader.PING:
ping(qp);
break;
case Leader.PROPOSAL:
TxnHeader hdr = new TxnHeader();
BinaryInputArchive ia = BinaryInputArchive
.getArchive(new ByteArrayInputStream(qp.getData()));
Record txn = SerializeUtils.deserializeTxn(ia, hdr);
if (hdr.getZxid() != lastQueued + 1) {
LOG.warn("Got zxid 0x"
+ Long.toHexString(hdr.getZxid())
+ " expected 0x"
+ Long.toHexString(lastQueued + 1));
}
lastQueued = hdr.getZxid();
fzk.logRequest(hdr, txn);
break;
case Leader.COMMIT:
fzk.commit(qp.getZxid());
break;
case Leader.UPTODATE:
fzk.takeSnapshot();
self.cnxnFactory.setZooKeeperServer(fzk);
break;
case Leader.REVALIDATE:
revalidate(qp);
break;
case Leader.SYNC:
fzk.sync();
break;
}
}
{code}
Notice the different way the Follower treat the COMMIT and the UPTODATE packets. When receives a COMMIT packet, the follower will give this to a processor to deal with. But if receives a UPTODATE packet, the follower will take a snapshot immediately. So it is possible that the server will take snapshot before it commits all the operations it missed. Then if the server crashed again and recovered, it will recover its data from the snapshot, so the date inconsistent with the leader now, but its last zxid is the same.
3992 No Perforce job exists for this issue. 0 32711
8 years, 38 weeks ago 0|i05yn3:
ZooKeeper ZOOKEEPER-1117

zookeeper 3.3.3 fails to build with gcc >= 4.6.1 on Debian/Ubuntu

Bug Closed Minor Fixed James Page James Page James Page 05/Jul/11 10:50   23/Nov/11 14:22 26/Aug/11 03:52 3.3.3, 3.4.0 3.3.4, 3.4.0 c client   0 2   Ubuntu Developement Release (11.10/Oneiric Ocelot), Debian Unstable (sid) zookeeper 3.3.3 (and 3.3.1) fails to build on Debian and Ubuntu systems with gcc >= 4.6.1:

/bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c -o zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c
libtool: compile: gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fPIC -DPIC -o .libs/zookeeper.o
src/zookeeper.c: In function 'getaddrs':
src/zookeeper.c:455:13: error: variable 'port' set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors

See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625441 for more information.
3993 No Perforce job exists for this issue. 7 32712
8 years, 30 weeks, 6 days ago
Reviewed
0|i05ynb:
ZooKeeper ZOOKEEPER-1116

Add MX4J Support to Zookeeper

Improvement Open Minor Unresolved Unassigned Erez Mazor Erez Mazor 03/Jul/11 03:43   03/Jul/11 03:43   3.3.4   server   0 0   It would be great to add MX4J support for Zookeeper, if possible it can be inspired by the Cassandra way for loading mx4j (which only starts if the mx4j jar is in the classpath, see https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/Mx4jTool.java)

2415 No Perforce job exists for this issue. 0 42054
8 years, 38 weeks, 4 days ago 0|i07kaf:
ZooKeeper ZOOKEEPER-1115

follower can not sync with leader

Bug Resolved Critical Not A Problem Unassigned helei helei 01/Jul/11 03:20   21/Oct/13 23:06 10/Oct/13 16:38 3.3.0, 3.3.3   quorum   0 6   linux rhel 4, x64, java version 1.4.2 exception causing shutdownthere are 5 members in the quorum. one follower can not sync with leader after restart. it seems leader has close the data connection with this follower because of read timeout. here is the key log in follower:
{noformat}
2011-06-30 22:14:45,069 - WARN [Thread-17:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:113)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:156)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
2011-06-30 22:14:45,069 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@689] - Notification: 3, 17198470148, 3, 3, LOOKING, LOOKING, 3
2011-06-30 22:14:45,070 - ERROR [Thread-16:QuorumCnxManager$SendWorker@559] - Failed to send last message. Shutting down thread.
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:548)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:557)
2011-06-30 22:14:45,082 - INFO [QuorumPeer:/0.0.0.0:2181:Learner@282] - Getting a diff from the leader 0x4011bd462
2011-06-30 22:14:45,083 - WARN [Thread-18:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2011-06-30 22:14:45,085 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@116] - Got zxid 0x4011bd405 expected 0x1
2011-06-30 22:14:45,090 - INFO [QuorumPeer:/0.0.0.0:2181:FileTxnSnapLog@208] - Snapshotting: 4011bd462
2011-06-30 22:14:53,397 - WARN [SyncThread:3:SendAckRequestProcessor@63] - Closing connection to leader, exception during packet send
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61)
at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
2011-06-30 22:14:53,398 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@82] - Exception when following the leader
java.net.SocketException: Socket closed
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
at org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:358)
at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:108)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:79)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:634)
2011-06-30 22:14:53,398 - WARN [SyncThread:3:SendAckRequestProcessor@63] - Closing connection to leader, exception during packet send
java.net.SocketException: Socket closed
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61)
at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164)
at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
2011-06-30 22:14:53,399 - INFO [QuorumPeer:/0.0.0.0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:638)
and these are the leader's:
2011-06-30 22:14:35,943 - ERROR [LearnerHandler-/10.23.247.163:14975:LearnerHandler@444] - Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358)
2011-06-30 22:14:35,943 - WARN [LearnerHandler-/10.23.247.163:14975:LearnerHandler@457] - ******* GOODBYE /10.23.247.163:14975 ********
2011-06-30 22:14:48,943 - ERROR [CommitProcessor:4:NIOServerCnxn@422] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1360)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:535)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
2011-06-30 22:14:49,084 - ERROR [LearnerHandler-/10.23.247.163:14998:LearnerHandler@444] - Unexpected exception causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358)
2011-06-30 22:14:49,084 - WARN [LearnerHandler-/10.23.247.163:14998:LearnerHandler@457] - ******* GOODBYE /10.23.247.163:14998 ********
{noformat}
2416 No Perforce job exists for this issue. 0 32713
6 years, 24 weeks ago 0|i05ynj:
ZooKeeper ZOOKEEPER-1114

Concurrent primitives library - barrier

Improvement Open Trivial Unresolved Unassigned Chia-Hung Lin Chia-Hung Lin 01/Jul/11 00:22   01/Jul/11 00:23       recipes   0 1   GNU/ Debian
java 1.6.0_21
zookeeper trunk (svn info shows Revision: 1141788)
The patch is provide according to wiki[1]. The source follows the description at tutorial[2].

However, from the mailing list it shows that this version is not optimized[3]. So is there any chance anyone can point out which algorithm may provide better result for this construct? I am happy to work on it, though it may take some time.

[1]. http://wiki.apache.org/hadoop/ZooKeeper/SoC2010Ideas#Concurrent_Primitives_Library
[2]. http://zookeeper.apache.org/doc/current/zookeeperTutorial.html#sc_barriers
[3]. http://mail-archives.apache.org/mod_mbox/zookeeper-user/201102.mbox/%3C87184214-59D4-4D64-A884-A6F07CE0F239@yahoo-inc.com%3E
gsoc2010 2417 No Perforce job exists for this issue. 1 42055
8 years, 38 weeks, 6 days ago 0|i07kan:
ZooKeeper ZOOKEEPER-1113

ZOOKEEPER-107 QuorumMaj counts the number of ACKs but does not check who sent the ACK

Sub-task Resolved Minor Fixed Alexander Shraer Alexander Shraer Alexander Shraer 30/Jun/11 19:00   07/Mar/13 01:46 07/Mar/13 01:46   3.5.0 quorum   0 3   If a server connects to the leader as follower, it will be allowed to vote (with QuorumMaj) even if it is not a follower in the current configuration,
as the leader does not care who sends the ACK - it only counts the number of ACKs.
2418 No Perforce job exists for this issue. 0 42056
7 years, 3 weeks ago 0|i07kav:
ZooKeeper ZOOKEEPER-1112

Add support for C client for SASL authentication

New Feature Resolved Major Fixed Damien Diederen Eugene Joseph Koontz Eugene Joseph Koontz 30/Jun/11 18:13   22/Jan/20 10:10 22/Jan/20 06:55   3.7.0     2 12 0 18000   Hopefully this would leverage the SASL server-side support provided by ZOOKEEPER-938. It would be similar to the Java SASL client support also provided in ZOOKEEPER-938.

Java has built-in SASL support, but I'm not sure what C libraries are available for SASL and if so, are they compatible with the Apache license.
100% 100% 18000 0 pull-request-available 2419 No Perforce job exists for this issue. 7 42057
31 weeks, 3 days ago 0|i07kb3:
ZooKeeper ZOOKEEPER-1111

JMXEnv uses System.err instead of logging

Bug Closed Major Fixed Ivan Kelly Ivan Kelly Ivan Kelly 30/Jun/11 04:51   23/Nov/11 14:22 19/Jul/11 17:39   3.4.0     0 1   As stated in the title, org.apache.zookeeper.test.JMXEnv uses System.err.println to output traces. This makes for a lot of noise on the console when you run the tests. It has a logging object already, so it should use that instead. 3994 No Perforce job exists for this issue. 1 32714
8 years, 36 weeks, 1 day ago
Reviewed
0|i05ynr:
ZooKeeper ZOOKEEPER-1110

c interface zookeeper_close close fd too quickly.

Bug Resolved Major Invalid Unassigned xiliu xiliu 29/Jun/11 01:30   29/Jul/12 01:37 25/Apr/12 03:55 3.3.3   c client   1 1 1800 1800 0% linux platform. The correct step about close client is the client send CLOSE_OP to the server, wait for several seconds, the server will process the terminal request and close the fd.
But the zookeeper_close interface is wrong, because the adaptor_send_queue(zh, 3000) (line 2332), will first wait the timeout then send the request.
The right order is first send the request then wait the timeout. I change as follow:
$svn diff src/c/src/zookeeper.c
Index: src/c/src/zookeeper.c
===================================================================
--- src/c/src/zookeeper.c (revision 1140451)
+++ src/c/src/zookeeper.c (working copy)
@@ -2329,7 +2329,8 @@

/* make sure the close request is sent; we set timeout to an arbitrary
* (but reasonable) number of milliseconds since we want the call to block*/
- rc=adaptor_send_queue(zh, 3000);
+ rc=adaptor_send_queue(zh, 0);
+ sleep(3);
}else{
LOG_INFO(("Freeing zookeeper resources for sessionId=%#llx\n",
zh->client_id.client_id));
0% 0% 1800 1800 2420 No Perforce job exists for this issue. 0 32715
7 years, 48 weeks, 1 day ago 0|i05ynz:
ZooKeeper ZOOKEEPER-1109

Zookeeper service is down when SyncRequestProcessor meets any exception.

Bug Closed Critical Fixed Laxman Laxman Laxman 24/Jun/11 00:48   23/Nov/11 14:22 25/Jul/11 17:01 3.3.0, 3.3.1, 3.3.2, 3.3.3 3.4.0 quorum   0 4 259200 259200 0% *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state.


*Scenario*
If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread.

*Root Cause*
this.join() is invoked in the same thread where System.exit(11) has been triggered.

When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked.
0% 0% 259200 259200 3995 No Perforce job exists for this issue. 2 32716
8 years, 35 weeks, 2 days ago
Reviewed
quorum, leader, disk full, shutdown 0|i05yo7:
ZooKeeper ZOOKEEPER-1108

Various bugs in zoo_add_auth in C

Bug Closed Blocker Fixed Dheeraj Agrawal Dheeraj Agrawal Dheeraj Agrawal 23/Jun/11 17:02   23/Nov/11 14:22 08/Sep/11 22:27 3.3.3 3.4.0 c client   0 6   3 issues:
In zoo_add_auth: there is a race condition:
2940 // [ZOOKEEPER-800] zoo_add_auth should return ZINVALIDSTATE if
2941 // the connection is closed.
2942 if (zoo_state(zh) == 0) {
2943 return ZINVALIDSTATE;
2944 }
when we do zookeeper_init, the state is initialized to 0 and above we check if state = 0 then throw exception.
There is a race condition where the doIo thread is slow and has not changed the state to CONNECTING, then you end up returning back ZKINVALIDSTATE.
The problem is we use 0 for CLOSED state and UNINITIALIZED state. in case of uninitialized case it should let it go through.

2nd issue:

Another Bug: in send_auth_info, the check is not correct

while (auth->next != NULL) { //--BUG: in cases where there is only one auth in the list, this will never send that auth, as its next will be NULL
rc = send_info_packet(zh, auth);
auth = auth->next;
}

FIX IS:
do {
rc = send_info_packet(zh, auth);
auth = auth->next;
} while (auth != NULL); //this will make sure that even if there is one auth ,that will get sent.

3rd issue:
2965 add_last_auth(&zh->auth_h, authinfo);
2966 zoo_unlock_auth(zh);
2967
2968 if(zh->state == ZOO_CONNECTED_STATE || zh->state == ZOO_ASSOCIATING_STATE)
2969 return send_last_auth_info(zh);

if it is connected, we only send the last_auth_info, which may be different than the one we added, as we unlocked it before sending it.

3996 No Perforce job exists for this issue. 5 32717
8 years, 28 weeks, 6 days ago
Reviewed
0|i05yof:
ZooKeeper ZOOKEEPER-1107

automating log and snapshot cleaning

New Feature Closed Major Fixed Laxman Jun Rao Jun Rao 23/Jun/11 10:48   23/Nov/11 14:22 02/Sep/11 16:51 3.3.3 3.4.0 server   3 4   I like to have ZK itself manage the amount of snapshots and logs kept, instead of relying on the PurgeTxnLog utility.
3997 No Perforce job exists for this issue. 8 33331
8 years, 29 weeks, 5 days ago
Reviewed
0|i062gv:
ZooKeeper ZOOKEEPER-1106

mt c client core when create node

Bug Open Major Unresolved zhang yafei jiang guangran jiang guangran 23/Jun/11 02:15   18/Mar/16 13:36   3.3.2   c client   0 0   in deserialize_CreateResponse
rc = rc ? : in->deserialize_String(in, "path", &v->path);
in deserialize_String
len = -1
so v->path is uninitialised, and free, so core

do_io thread
#0 0x00000039fb030265 in raise () from /lib64/libc.so.6
#1 0x00000039fb031d10 in abort () from /lib64/libc.so.6
#2 0x00000039fb06a84b in __libc_message () from /lib64/libc.so.6
#3 0x00000039fb0722ef in _int_free () from /lib64/libc.so.6
#4 0x00000039fb07273b in free () from /lib64/libc.so.6
#5 0x00002b0afd755dd1 in deallocate_String (s=0x5a490f40) at src/recordio.c:29
#6 0x00002b0afd754ade in zookeeper_process (zh=0x131e3870, events=<value optimized out>) at src/zookeeper.c:2071
#7 0x00002b0afd75b2ef in do_io (v=<value optimized out>) at src/mt_adaptor.c:310
#8 0x00000039fb8064a7 in start_thread () from /lib64/libpthread.so.0
#9 0x00000039fb0d3c2d in clone () from /lib64/libc.so.6

create_node thread
#0 0x00000039fb80ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00002b0afd75af5c in wait_sync_completion (sc=0x131e4c90) at src/mt_adaptor.c:82
#2 0x00002b0afd751750 in zoo_create (zh=0x131e3870, path=0x13206fa8 "/jsq/zr2/hb/10.250.8.139:8102",
value=0x131e86a8 "\n\021\061\060.250.8.139:8102\022\035/home/shaoqiang/workdir2/qrs/\030\001 \001*%\n\020\n",
valuelen=102, acl=0x2b0afd961700, flags=1, path_buffer=0x0, path_buffer_len=0) at src/zookeeper.c:3028
2421 No Perforce job exists for this issue. 1 32718
4 years, 6 days ago 0|i05yon:
ZooKeeper ZOOKEEPER-1105

c client zookeeper_close not send CLOSE_OP request to server

Bug Closed Major Fixed Mate Szalay-Beko jiang guangran jiang guangran 23/Jun/11 02:05   14/Feb/20 10:23 05/Feb/20 03:33 3.3.2, 3.4.3 3.6.0, 3.5.7, 3.7.0 c client   5 15 0 12600   in zookeeper_close function, do adaptor_finish before send CLOSE_OP request to server
so the CLOSE_OP request can not be sent to server

in server zookeeper.log have many
2011-06-22 00:23:02,323 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to read additional data from client sessionid 0x1305970d66d2224, likely client has closed socket
2011-06-22 00:23:02,324 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /10.250.8.123:60257 which had sessionid 0x1305970d66d2224
2011-06-22 00:23:02,325 - ERROR [CommitProcessor:1:NIOServerCnxn@445] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)

and java client not have this problem
100% 100% 12600 0 pull-request-available 2422 No Perforce job exists for this issue. 5 2377
7 weeks, 2 days ago 0|i00rfr:
ZooKeeper ZOOKEEPER-1104

CLONE - In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove.

Improvement Closed Minor Fixed Eugene Joseph Koontz sreekanth sreekanth 22/Jun/11 14:21   23/Nov/11 14:22 14/Aug/11 20:54 3.4.0 3.4.0 tests   0 0   Patrick Hunt writes:

"Such uses of sleep [used in testFollowersStartAfterLeader] are just asking for trouble. Take a look at the use
of sleep in testSessionMove in the same class for a better way to do
this. I had gone through all the tests a while back, replacing all the
"sleep(x)" with something like this testSessionMove pattern (retry
with a max limit that's very long). During reviews we should look for
anti-patterns like this and address them before commit."

So, modify testFollowersStartAfterLeaders to use the same retrying approach that testSessionMove uses.
47478 No Perforce job exists for this issue. 3 33332
8 years, 32 weeks, 3 days ago Contains improvement to original patch (remove unneeded boolean variable).
Reviewed
0|i062h3:
ZooKeeper ZOOKEEPER-1103

In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove.

Improvement Closed Minor Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 21/Jun/11 17:23   23/Nov/11 14:22 22/Jun/11 15:47 3.3.3, 3.4.0 3.3.4, 3.4.0 tests   0 0   Patrick Hunt writes:

"Such uses of sleep [used in testFollowersStartAfterLeader] are just asking for trouble. Take a look at the use
of sleep in testSessionMove in the same class for a better way to do
this. I had gone through all the tests a while back, replacing all the
"sleep(x)" with something like this testSessionMove pattern (retry
with a max limit that's very long). During reviews we should look for
anti-patterns like this and address them before commit."

So, modify testFollowersStartAfterLeaders to use the same retrying approach that testSessionMove uses.
47479 No Perforce job exists for this issue. 4 33333
8 years, 40 weeks, 1 day ago
Reviewed
0|i062hb:
ZooKeeper ZOOKEEPER-1102

Need update for programmer manual to cover multi operation

Bug Open Major Unresolved Unassigned Ted Dunning Ted Dunning 21/Jun/11 13:53   08/Aug/11 14:05           0 0   The new multi operation is undocumented as yet. Clearly it needs some doc to cover:

1) the basic syntax

2) java code sample

3) C code sample
2423 No Perforce job exists for this issue. 0 32719
8 years, 40 weeks, 2 days ago 0|i05yov:
ZooKeeper ZOOKEEPER-1101

Upload zookeeper-test maven artifacts to maven repository.

Bug Closed Major Fixed Patrick D. Hunt Ivan Kelly Ivan Kelly 21/Jun/11 13:15   23/Nov/11 14:22 01/Aug/11 14:31   3.4.0     0 1   These are generated by ant package since ZOOKEEPER-1042, they just need to be pushed to a maven repo. Bookkeeper requires this package to build. 47480 No Perforce job exists for this issue. 0 32720
8 years, 34 weeks, 3 days ago
Reviewed
0|i05yp3:
ZooKeeper ZOOKEEPER-1100

Killed (or missing) SendThread will cause hanging threads

Bug Resolved Major Fixed Camille Fournier Gunnar Wagenknecht Gunnar Wagenknecht 21/Jun/11 05:24   02/Mar/16 20:37 26/Dec/11 10:56 3.3.3 3.5.0 java client   0 5   http://mail-archives.apache.org/mod_mbox/zookeeper-user/201106.mbox/%3Citpgb6$2mi$1@dough.gmane.org%3E After investigating an issues with [hanging threads|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201106.mbox/%3Citpgb6$2mi$1@dough.gmane.org%3E] I noticed that any java.lang.Error might silently kill the SendThread. Without a SendThread any thread that wants to send something will hang forever.

Currently nobody will recognize a SendThread that died. I think at least a state should be flipped (or flag should be set) that causes all further send attempts to fail or to re-spin the connection loop.
2424 No Perforce job exists for this issue. 2 32721
8 years, 1 week, 1 day ago
Incompatible change
0|i05ypb:
ZooKeeper ZOOKEEPER-1099

Add simple examples to show the usage of zookeeper

Improvement Open Minor Unresolved Unassigned divya divya 20/Jun/11 14:48   20/Jun/11 14:52       java client   0 0   We used zookeeper to make one of our service highly available. I have written a sample program which shows the usage of zookeeper to make the required service highly available . Please review the client code attached . 2425 No Perforce job exists for this issue. 1 42058
8 years, 40 weeks, 3 days ago 0|i07kbb:
ZooKeeper ZOOKEEPER-1098

Upload native libraries as Maven artifacts

New Feature Resolved Minor Duplicate Unassigned Joey Echeverria Joey Echeverria 20/Jun/11 09:34   16/Jul/14 16:43 23/Apr/14 18:21   3.5.0     0 5   HBase is planning to make use of the native ZooKeeper libraries in order to have small session timeouts that aren't affected by GC pauses (see HBASE-1316). The current patch uses a custom maven packaging of the ZooKeeper native libraries. It would be nice if ZooKeeper published those artifacts as part of its release process. 2426 No Perforce job exists for this issue. 0 42059
5 years, 36 weeks, 1 day ago 0|i07kbj:
ZooKeeper ZOOKEEPER-1097

Quota is not correctly rehydrated on snapshot reload

Bug Closed Blocker Fixed Camille Fournier Camille Fournier Camille Fournier 16/Jun/11 10:07   23/Nov/11 14:22 26/Jun/11 19:30 3.3.3, 3.4.0 3.3.4, 3.4.0 server   0 1   traverseNode in DataTree will never actually traverse the limit nodes properly. 47481 No Perforce job exists for this issue. 7 32722
8 years, 39 weeks, 4 days ago
Reviewed
0|i05ypj:
ZooKeeper ZOOKEEPER-1096

Leader communication should listen on specified IP, not wildcard address

Improvement Closed Minor Fixed Germán Blanco Jared Cantwell Jared Cantwell 15/Jun/11 14:59   13/Mar/14 14:16 25/Sep/13 18:14 3.3.3, 3.4.0 3.4.6, 3.5.0 server   4 7   Server should specify the local address that is used for leader communication and leader election (and not use the default of listening on all interfaces). This is similar to the clientPortAddress parameter that was added a year ago. After reviewing the code, we can't think of a reason why only the port would be used with the wildcard interface, when servers are already connecting specifically to that interface anyway.

I have submitted a patch, but it does not account for all leader election algorithms.

Probably should have an option to toggle this, for backwards compatibility, although it seems like it would be a bug if this change broke things.

There is some more information about making it an option here:
http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=H2+7Gnj_4p28hgCXjh345HiyN@mail.gmail.com%3E
33 No Perforce job exists for this issue. 8 42060
6 years, 2 weeks ago 0|i07kbr:
ZooKeeper ZOOKEEPER-1095

Simple leader election recipe

Improvement Closed Major Fixed Eric Sammer Henry Robinson Henry Robinson 15/Jun/11 13:04   17/May/14 08:03 07/Jul/11 18:55 3.3.3 3.4.0     2 5   Leader election recipe originally contributed to ZOOKEEPER-1080. 47482 No Perforce job exists for this issue. 2 33334
8 years, 37 weeks, 6 days ago Adds an implementation of the leader election recipe
Reviewed
0|i062hj:
ZooKeeper ZOOKEEPER-1094

Small improvements to LeaderElection and Vote classes

Improvement Closed Minor Fixed Henry Robinson Henry Robinson Henry Robinson 13/Jun/11 18:14   23/Nov/11 14:22 16/Jun/11 19:34   3.4.0 quorum   0 0   1. o.a.z.q.Vote is a struct-style class, whose fields are public and not final.

In general, we should prefer making the fields of these kind of classes final, and hiding them behind getters for the following reasons:

* Marking them as final allows clients of the class not to worry about any synchronisation when accessing the fields
* Hiding them behind getters allows us to change the implementation of the class without changing the API.

Object creation is very cheap. It's ok to create new Votes rather than mutate existing ones.

2. Votes are mainly used in the LeaderElection class. In this class a map of addresses to votes is passed in to countVotes, which modifies the map contents inside an iterator (and therefore changes the object passed in by reference). This is pretty gross, so at the same time I've slightly refactored this method to return information about the number of validVotes in the ElectionResult class, which is returned by countVotes.

3. The previous implementation of countVotes was quadratic in the number of votes. It is possible to do this linearly. No real speed-up is expected as a result, but it salves the CS OCD in me :)

47483 No Perforce job exists for this issue. 2 33335
8 years, 41 weeks ago
Reviewed
0|i062hr:
ZooKeeper ZOOKEEPER-1093

ZooKeeper quotas will always trigger if set on one criteria but not the other

Bug Resolved Major Duplicate Camille Fournier Camille Fournier Camille Fournier 13/Jun/11 16:05   19/Jul/11 19:55 19/Jul/11 19:55 3.3.3, 3.4.0   server   0 0   /testing has quota on bytes but not node count. Count quota will always fire because it is set to -1 and will always fail comparison.

2011-06-13 16:01:53,492 - WARN [CommitProcessor:3:DataTree@373] - Quota exceeded: /testing count=4 limit=-1

67451 No Perforce job exists for this issue. 1 32723
8 years, 36 weeks, 2 days ago 0|i05ypr:
ZooKeeper ZOOKEEPER-1092

get rid of pending changes

Improvement Open Minor Unresolved Unassigned Benjamin Reed Benjamin Reed 11/Jun/11 21:49   04/Nov/11 12:20           0 1   pending changes used by PrepRequestProcessor and FinalRequestProcessor is complicated and requires synchronization between threads. 2427 No Perforce job exists for this issue. 0 42061
8 years, 20 weeks, 6 days ago 0|i07kbz:
ZooKeeper ZOOKEEPER-1091

when the chrootPath of ClientCnxn is not null and the Watches of zooKeeper is not null and the method primeConnection(SelectionKey k) of ClientCnxn Occurred again for some reason ,then the wrong watcher clientPath is sended to server

Bug Closed Critical Duplicate Unassigned zhangyouming zhangyouming 09/Jun/11 23:21   23/Nov/11 14:22 16/Oct/11 14:21 3.3.3 3.4.0 java client   0 3 3600 3600 0% Linux version 2.6.18-194.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri Apr 2 14:58:14 EDT 2010 if the chrootPath of ClientCnxn is not null and the Watches of zooKeeper is not null; and then for some reason(like zookeeper server stop and start), the zookeeper client will primeConnection to server again and tell server the watcher path,but the path is wrong,it show be serverpath but not clientpath;if the wrong watcher clientPath is sended to server,
the exception will occurr, the exceptions:

2011-06-10 04:33:16,935 [pool-2-thread-30-SendThread(DB1-6:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x5302c4403a30232 for server DB1-6/192.168.1.6:2181, unexpected error, closing socket connection and attempting reconnect
java.lang.StringIndexOutOfBoundsException: String index out of range: -6
at java.lang.String.substring(String.java:1937)
at java.lang.String.substring(String.java:1904)
at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:794)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:881)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
0% 0% 3600 3600 2428 No Perforce job exists for this issue. 0 32724
8 years, 26 weeks, 6 days ago 0|i05ypz:
ZooKeeper ZOOKEEPER-1090

Race condition while taking snapshot can lead to not restoring data tree correctly

Bug Closed Critical Fixed Vishal Kher Vishal Kher Vishal Kher 09/Jun/11 10:24   23/Nov/11 14:22 28/Jul/11 01:50 3.3.3 3.4.0 server   0 4   I think I have found a bug in the snapshot mechanism.

The problem occurs because dt.lastProcessedZxid is not synchronized (or rather set before the data tree is modified):

FileTxnSnapLog:
{code}
public void save(DataTree dataTree,
ConcurrentHashMap<Long, Integer> sessionsWithTimeouts)
throws IOException {
long lastZxid = dataTree.lastProcessedZxid;
LOG.info("Snapshotting: " + Long.toHexString(lastZxid));
File snapshot=new File(
snapDir, Util.makeSnapshotName(lastZxid));
snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot); <=== the Datatree may not have the modification for lastProcessedZxid
}
{code}

DataTree:
{code}
public ProcessTxnResult processTxn(TxnHeader header, Record txn) {
ProcessTxnResult rc = new ProcessTxnResult();

String debug = "";
try {
rc.clientId = header.getClientId();
rc.cxid = header.getCxid();
rc.zxid = header.getZxid();
rc.type = header.getType();
rc.err = 0;
if (rc.zxid > lastProcessedZxid) {
lastProcessedZxid = rc.zxid;
}
[...modify data tree...]
}
{code}
The lastProcessedZxid must be set after the modification is done.

As a result, if server crashes after taking the snapshot (and the snapshot does not contain change corresponding to lastProcessedZxid) restore will not restore the data tree correctly:
{code}
public long restore(DataTree dt, Map<Long, Integer> sessions,
PlayBackListener listener) throws IOException {
snapLog.deserialize(dt, sessions);
FileTxnLog txnLog = new FileTxnLog(dataDir);
TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes lastProcessedZxid is deserialized
}
{code}


I have had offline discussion with Ben and Camille on this. I will be posting the discussion shortly.
persistence, server, snapshot 47484 No Perforce job exists for this issue. 1 32725
8 years, 33 weeks, 1 day ago
Reviewed
0|i05yq7:
ZooKeeper ZOOKEEPER-1089

zkServer.sh status does not work due to invalid option of nc

Bug Resolved Major Fixed Roman Shaposhnik William Au William Au 09/Jun/11 09:57   28/Dec/11 05:58 28/Dec/11 01:08 3.3.4, 3.4.0 3.4.3, 3.3.5, 3.5.0 scripts   0 4   The nc command used by zkServer.sh does not have the "-q" option on some linux versions ( I have checked RedHat/Fedora and FreeBSD). 2429 No Perforce job exists for this issue. 2 32726
8 years, 13 weeks, 1 day ago
Reviewed
0|i05yqf:
ZooKeeper ZOOKEEPER-1088

delQuota does not remove the quota node and subesquent setquota calls for that path will fail

Bug Resolved Major Won't Fix Camille Fournier Camille Fournier Camille Fournier 08/Jun/11 16:04   17/Nov/11 01:05 13/Jun/11 16:04 3.3.3   server   0 1 86400 86400 0% sequota -b 1000 /testing
delquota -b /testing
setquota -n 1024 /testing
Command failed: java.lang.IllegalArgumentException: /testing has a parent /zookeeper/quota/testing which has a quota
0% 0% 86400 86400 71162 No Perforce job exists for this issue. 0 32727
8 years, 41 weeks, 1 day ago 0|i05yqn:
ZooKeeper ZOOKEEPER-1087

ForceSync VM arguement not working when set to "no"

Bug Closed Blocker Fixed Nate Putnam Ankit Patel Ankit Patel 06/Jun/11 16:09   23/Nov/11 14:22 21/Jun/11 01:34 3.3.3 3.3.4, 3.4.0 scripts   0 2 300 300 0% Cannot use forceSync=no to asynchronously write transaction logs. This is a critical bug, please address it ASAP. More details:

The class org.apache.zookeeper.server.persistence.FileTxnLog initializes forceSync property in a static block. However, the static variable is defined after the static block with a default value of true. Therefore, the value of the variable can never be false. Please move the declaration of the variable before the static block.
0% 0% 300 300 47485 No Perforce job exists for this issue. 3 32728
8 years, 40 weeks, 2 days ago Respect the "zookeeper.forceSync" system property.
Reviewed
0|i05yqv:
ZooKeeper ZOOKEEPER-1086

zookeeper test jar has non mavenised dependency.

Bug Closed Major Fixed Ivan Kelly Ivan Kelly Ivan Kelly 01/Jun/11 06:52   23/Nov/11 14:22 19/Oct/11 02:56   3.4.0     0 2   The zookeeper test jar, (zookeeper-<version>-test.jar) depends on accessive.jar which is not available in maven. This is problematic for projects using the test jar (i.e. hedwig). 177 No Perforce job exists for this issue. 2 32729
8 years, 23 weeks, 1 day ago
Reviewed
0|i05yr3:
ZooKeeper ZOOKEEPER-1085

CLONE - Deploy ZooKeeper jars/artifacts to a Maven Repository

Task Resolved Critical Not A Problem Patrick D. Hunt Michael Duergner Michael Duergner 01/Jun/11 03:22   01/Oct/13 20:10 01/Oct/13 20:10 3.0.0   build   0 0   Looks like 3.3.2 and 3.3.3 didn't get deployed on the Apache Maven Repository 2430 No Perforce job exists for this issue. 0 42062
8 years, 43 weeks, 1 day ago 0|i07kc7:
ZooKeeper ZOOKEEPER-1084

Hard-coding a well-known location for configuration directory gives less flexibility for packaging Zookeeper configurations

Improvement Resolved Minor Duplicate Roman Shaposhnik Roman Shaposhnik Roman Shaposhnik 31/May/11 18:55   21/Jun/11 13:33 21/Jun/11 13:33 3.3.2   scripts   0 0   Currently, Zookeeper relies on zkEnv.sh logic to discover the location of the configuration directory if none is specified:

{noformat}
# We use ZOOCFGDIR if defined,
# otherwise we use /etc/zookeeper
# or the conf directory that is
# a sibling of this script's directory
if [ "x$ZOOCFGDIR" = "x" ]
then
if [ -d "/etc/zookeeper" ]
then
ZOOCFGDIR="/etc/zookeeper"
else
ZOOCFGDIR="$ZOOBINDIR/../conf"
fi
fi
{noformat}

The problem with such an approach is that having /etc/zookeeper (for whatever reason) trips this logic up in believing that
it is THE place. It would be much nicer to follow the suit of other Apache Hadoop projects and restrict the logic to
$ZOOCFGDIR and $ZOOBINDIR/../conf

Please note, that if that happens one can always have an existing behavior of picking up /etc/zookeeper by creating
a symlink at $ZOOBINDIR/../conf pointing to it.
37452 No Perforce job exists for this issue. 1 30002
8 years, 40 weeks, 2 days ago 0|i05hxj:
ZooKeeper ZOOKEEPER-1083

Javadoc for WatchedEvent not being generated

Bug Closed Major Fixed Ivan Kelly Ivan Kelly Ivan Kelly 31/May/11 12:23   23/Nov/11 14:22 13/Jun/11 13:25   3.4.0     0 1   See title. 47486 No Perforce job exists for this issue. 1 32730
8 years, 40 weeks, 6 days ago
Reviewed
0|i05yrb:
ZooKeeper ZOOKEEPER-1082

ZOOKEEPER-335 modify leader election to correctly take into account current epoch

Sub-task Closed Major Fixed Flavio Paiva Junqueira Benjamin Reed Benjamin Reed 30/May/11 11:16   23/Nov/11 14:22 14/Jun/11 01:14   3.4.0 server   0 1   when comparing zxids for leader election, the current epoch of the peer needs to be taken into account. 47487 No Perforce job exists for this issue. 2 33336
8 years, 40 weeks, 6 days ago Committed revision 1135382. 0|i062hz:
ZooKeeper ZOOKEEPER-1081

ZOOKEEPER-335 modify leader/follower code to correctly deal with new leader

Sub-task Closed Major Fixed Benjamin Reed Benjamin Reed Benjamin Reed 30/May/11 11:15   23/Nov/11 14:22 14/Jun/11 01:14   3.4.0 server   0 1   the leader and follower code need to be modified to correctly handle and log epoch changes 47488 No Perforce job exists for this issue. 2 33337
8 years, 40 weeks, 6 days ago Committed revision 1135382. 0|i062i7:
ZooKeeper ZOOKEEPER-1080

Provide a Leader Election framework based on Zookeeper recipe

New Feature Resolved Major Duplicate Hari A V Hari A V Hari A V 30/May/11 09:24   17/May/14 08:03 17/May/14 08:03 3.3.2 3.5.0 contrib   6 21   Currently Hadoop components such as NameNode and JobTracker are single point of failure.
If Namenode or JobTracker goes down, there service will not be available until they are up and running again. If there was a Standby Namenode or JobTracker available and ready to serve when Active nodes go down, we could have reduced the service down time. Hadoop already provides a Standby Namenode implementation which is not fully a "hot" Standby.
The common problem to be addressed in any such Active-Standby cluster is Leader Election and Failure detection. This can be done using Zookeeper as mentioned in the Zookeeper recipes.
http://zookeeper.apache.org/doc/r3.3.3/recipes.html


+Leader Election Service (LES)+

Any Node who wants to participate in Leader Election can use this service. They should start the service with required configurations. The service will notify the nodes whether they should be started as Active or Standby mode. Also they intimate any changes in the mode at runtime. All other complexities can be handled internally by the LES.
2431 No Perforce job exists for this issue. 4 42063
5 years, 44 weeks, 5 days ago 0|i07kcf:
ZooKeeper ZOOKEEPER-1079

'Create' command in Hbase makes a table in Hbase but it sends 'Delete' request to Zookeeper !!!

Test Resolved Major Not A Problem Unassigned Mohamad Koohi-Moghadam Mohamad Koohi-Moghadam 29/May/11 02:41   08/Jun/11 12:24 08/Jun/11 12:24 3.3.3       0 0   "when use 'Create' in Hbase==> Got user-level KeeperException... type:delete" And caused zookeper make Nonode Exception.... and when make a znode in zookeeper shell for example a node with name 'mkm' , and in hbase command line use create 'mkm' , 'm' this command delete 'mkm' from zookeeper !!

Linux Ubuntu
Zookeeper and Hbase
71569 No Perforce job exists for this issue. 0 33338
8 years, 42 weeks, 1 day ago 0|i062if:
ZooKeeper ZOOKEEPER-1078

add maven build support to ZooKeeper

Improvement Resolved Major Duplicate Mohammad Arshad Patrick D. Hunt Patrick D. Hunt 26/May/11 20:00   11/Feb/19 06:45 11/Feb/19 06:45     build   4 17   I've taken a stab at creating a maven build for ZooKeeper. (attachment to follow).
2432 No Perforce job exists for this issue. 5 2565
1 year, 5 weeks, 3 days ago 0|i00slj:
ZooKeeper ZOOKEEPER-1077

C client lib doesn't build on Solaris

Bug Closed Critical Fixed Chris Nauroth Tadeusz Andrzej Kadłubowski Tadeusz Andrzej Kadłubowski 26/May/11 04:35   21/Jul/16 16:18 18/May/15 03:39 3.3.4 3.4.7, 3.5.2, 3.6.0 build, c client   0 7   uname -a: SunOS [redacted] 5.10 Generic_142910-17 i86pc i386 i86pc
GNU toolchain (gcc 3.4.3, GNU Make etc.)
Hello,

Some minor trouble with building ZooKeeper C client library on Sun^H^H^HOracle Solaris 5.10.

1. You need to link against "-lnsl -lsocket"

2. ctime_r needs a buffer size. The signature is: "char *ctime_r(const time_t *clock, char *buf, int buflen)"

3. In zk_log.c you need to manually cast pid_t to int (-Werror can be cumbersome ;) )

4. getpwuid_r()returns pointer to struct passwd, which works as the last parameter on Linux.

Solaris signature: struct passwd *getpwuid_r(uid_t uid, struct passwd *pwd, char *buffer, int buflen);
Linux signature: int getpwuid_r(uid_t uid, struct passwd *pwd, char *buf, size_t buflen, struct passwd **result);
2433 No Perforce job exists for this issue. 4 32731
4 years, 44 weeks, 3 days ago Support for building C client lib on Illumos (and presumably OpenSolaris). Configure with "CPPFLAGS=-D_POSIX_PTHREAD_SEMANTICS LDFLAGS="-lnsl -lsocket" ./configure" 0|i05yrj:
ZooKeeper ZOOKEEPER-1076

some quorum tests are unnecessarily extending QuorumBase

Bug Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 25/May/11 18:19   23/Nov/11 14:22 29/Jul/11 04:14 3.4.0 3.4.0 tests   0 1   Some tests are unnecessarily extending QuorumBase. Typically this is not a big issue, but it may cause more servers than necessary to be started (harder to debug a failing test in particular).
47489 No Perforce job exists for this issue. 2 32732
8 years, 33 weeks, 1 day ago
Reviewed
0|i05yrr:
ZooKeeper ZOOKEEPER-1075

Zookeeper Server cannot join an existing ensemble if the existing ensemble doesn't already have a quorum

Bug Resolved Major Not A Problem Unassigned Vishal Kathuria Vishal Kathuria 25/May/11 17:39   27/May/11 03:07 26/May/11 14:19 3.3.2   leaderElection   0 4 172800 172800 0% Windows 7 Here is the sequence of steps that reproduces the problem.
On a 3 server ensemble,
1. Bring up two servers (say 1 and 2). Lets say 1 is leading.
2. Bring down 2
3. Bring up 2.
4. 2 gets a notification from 1 that it is leading but 2 doesn't accept it as a leader since it cannot find one other node that thinks 1 is the leader.


So the ensemble gets stuck where 2 isn't following. If at this point, 3 comes up, then one of 2 & 3 will become a leader and 1 will keep thinking it is the leader.


I am working on a patch to fix this issue.
0% 0% 172800 172800 214218 No Perforce job exists for this issue. 1 32733
8 years, 43 weeks, 6 days ago 0|i05yrz:
ZooKeeper ZOOKEEPER-1074

zkServer.sh is missing nohup/sleep, which are necessary for remote invocation

Bug Closed Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 25/May/11 14:27   23/Nov/11 14:22 27/Jun/11 01:04 3.3.3, 3.4.0 3.4.0 scripts   0 1   zkServer.sh is missing nohup and "sleep 1" when starting the background daemon.

This is fine normally, however when running the server remotely via ssh this causes the process to not run successfully (it starts but immediately exits).

I'll be submitting a patch for this shortly.
47490 No Perforce job exists for this issue. 1 32734
8 years, 39 weeks, 3 days ago
Reviewed
0|i05ys7:
ZooKeeper ZOOKEEPER-1073

address a documentation issue in ZOOKEEPER-1030

Bug Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 25/May/11 13:52   23/Nov/11 14:22 07/Jul/11 03:32 3.4.0 3.4.0 documentation   0 1   ZOOKEEPER-1030 updated the generated docs, not the source docs. I'll submit a patch to address in the src. 47491 No Perforce job exists for this issue. 1 32735
8 years, 38 weeks ago
Reviewed
0|i05ysf:
ZooKeeper ZOOKEEPER-1072

Support for embedded ZooKeeper

Task Open Major Unresolved Unassigned Vishal Kher Vishal Kher 24/May/11 18:08   05/Feb/20 07:17   3.3.0 3.7.0, 3.5.8 server   3 5  
We have seen several cases where users have embedded zookeeper in
their application instead of running ZooKeeper in an independent JVM.

Different applications use different ways of starting and stopping QuorumPeer.
Instead, we should provide a standard and simple API for starting/stopping
zookeeper (and also document it).


embedd, server 2434 No Perforce job exists for this issue. 0 42064
8 years, 44 weeks, 2 days ago 0|i07kcn:
ZooKeeper ZOOKEEPER-1071

zkServer.sh script needs to track whether ZK is already running or not

Bug Resolved Major Duplicate Unassigned Roman Shaposhnik Roman Shaposhnik 24/May/11 13:20   26/May/11 13:48 26/May/11 13:48     scripts   0 0   If one repeatedly invokes:

{noformat}
/usr/lib/zookeeper/bin/zkServer.sh start
{noformat}

after the initial start 2 bad things happen:

1. ZK reports that it got started where in reality it failed with the following:
{noformat}
2011-05-24 10:18:58,217 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181
2011-05-24 10:18:58,219 - FATAL [main:ZooKeeperServerMain@62] - Unexpected exception, exiting abnormally
java.net.BindException: Address already in use
{noformat}

2. It clobbers the zookeeper_server.pid file
214217 No Perforce job exists for this issue. 0 32736
8 years, 44 weeks ago 0|i05ysn:
ZooKeeper ZOOKEEPER-1070

let org.apache.zookeeper.recipes.lock.WriteLock implements java.util.concurrent.locks.Lock

Improvement Open Major Unresolved Unassigned Yanming Zhou Yanming Zhou 23/May/11 22:23   23/May/11 22:23       recipes   2 4   and add a zookeeper distributed java.util.concurrent.locks.ReadWriteLock
use concurrent locks internally,don't use keyword synchronized
2435 No Perforce job exists for this issue. 0 42065
8 years, 44 weeks, 2 days ago 0|i07kcv:
ZooKeeper ZOOKEEPER-1069

Calling shutdown() on a QuorumPeer too quickly can lead to a corrupt log

Bug Closed Critical Fixed Vishal Kher Jeremy Stribling Jeremy Stribling 23/May/11 19:53   23/Nov/11 14:22 17/Jul/11 10:36 3.3.3 3.3.4, 3.4.0 quorum, server   0 1   Linux, ZK 3.3.3, 3-node cluster. I've only seen this happen once. In order to restart Zookeeper with a new set of servers, we have a wrapper class that calls shutdown() on an existing QuorumPeer, and then starts a new one with a new set of servers. Specifically, our shutdown code looks like this:

{code}
synchronized(_quorum_peer) {
_quorum_peer.shutdown();
FastLeaderElection fle = (FastLeaderElection) _quorum_peer.getElectionAlg();
fle.shutdown(); // I think this is unnecessary
try {
_quorum_peer.getTxnFactory().commit();
} catch (java.nio.channels.ClosedChannelException e) {
// ignore
}
}
{code}

One time, our wrapper class started one QuorumPeer, and then had to shut it down and start a new one very soon after the QuorumPeer transitioned into a FOLLOWING state. When the new QuorumPeer tried to read in the latest log from disk, it encountered a bogus magic number of all zeroes:

{noformat}
2011-05-18 22:42:29,823 10467 [pool-1-thread-2] FATAL org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on disk
java.io.IOException: Transaction log: /var/cloudnet/data/zookeeper/version-2/log.700000001 has invalid magic number 0 != 1514884167
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:510)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:527)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:493)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:576)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:479)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:454)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:325)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:126)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
...
2011-05-18 22:42:29,823 10467 [pool-1-thread-2] ERROR com.nicira.onix.zookeeper.Zookeeper - Unexpected exception
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401)
at com.nicira.onix.zookeeper.Zookeeper.StartZookeeper(Zookeeper.java:198)
at com.nicira.onix.zookeeper.Zookeeper.RestartZookeeper(Zookeeper.java:277)
at com.nicira.onix.zookeeper.ZKRPCService.setServers(ZKRPC.java:83)
at com.nicira.onix.zookeeper.Zkrpc$ZKRPCService.callMethod(Zkrpc.java:8198)
at com.nicira.onix.rpc.RPC$10.run(RPC.java:534)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Transaction log: /var/cloudnet/data/zookeeper/version-2/log.700000001 has invalid magic number 0 != 1514884167
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:510)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:527)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:493)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:576)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:479)
at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:454)
at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:325)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:126)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
... 8 more
{noformat}

I looked into the code a bit, and I believe the problem comes from the fact that QuorumPeer.shutdown() does not join() on this before returning. Here's the scenario I think can happen:

# QuorumPeer.run() notices it is in the FOLLOWING state, makes a new Follower, and calls Follower.followLeader(), which starts connecting to the leader.
# In the main program thread, QuorumPeer.shutdown() is called.
# Through a complicated series of calls, this eventually leads to FollowerZooKeeperServer.shutdown() being called.
# This method calls SyncRequestProcess.shutdown(), which joins on this and returns. However, it's possible that the SyncRequestProcessor thread hasn't yet been started because followLeader() hasn't yet called Learner.syncWithLeader(), which hasn't yet called ZooKeeperServer.startup(), which actually starts the thread. Thus, the join would have no request, though a requestOfDeath is added to the queued requests list (possibly behind other requests).
# Back in the main thread, FileTxnSnapLog.commit() is called, which doesn't do much because the processor hasn't processed anything yet.
# Finally, ZooKeeperServer.startup is called in the QuorumPeer.run() thread, starting up the SyncRequestProcessor thread.
# That thread appends some request to the log. The log doesn't exist yet, so it creates a new one, padding it with zeroes.
# Now either the SyncRequestProcessor hits the requestOfDeath or the whole QuorumPeer object is deleted. It exits that thread without ever committing the log to disk (or the new QuorumPeer tries to read the log before the old thread gets to commit anything), and the log ends up with all zeroes instead of a proper magic number.

I haven't yet looked into whether there's an easy way to join() on the QuorumPeer thread from shutdown(), so that it won't go on to start the processor threads after it's been shutdown. I wanted to check with the group first and see if anyone else agrees this could be a problem.

I marked this as minor since I think almost no one else uses Zookeeper this way, but it's pretty important to me personally.

I will upload a log file showing this behavior shortly.
persistence, shutdown 47492 No Perforce job exists for this issue. 4 32737
8 years, 36 weeks, 4 days ago 0|i05ysv:
ZooKeeper ZOOKEEPER-1068

Documentation and default config suggest incorrect location for Zookeeper state

Bug Closed Minor Fixed Roman Shaposhnik Roman Shaposhnik Roman Shaposhnik 23/May/11 19:08   23/Nov/11 14:22 21/Jun/11 13:24   3.4.0 documentation, scripts   0 1   Documentation and default config suggest /var/zookeeper as a value for dataDir. This practice is, strictly speaking, incompatible with UNIX/Linux filesystem layout standards (e.g. http://www.s-gms.ms.edus.si/cgi-bin/man-cgi?filesystem+5 , http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/index.html ).

Even though Zookeeper use is not limited to UNIX-like OSes I'd recommend that we change references to /var/zookeeper to /var/lib/zookeeper
47493 No Perforce job exists for this issue. 1 32738
8 years, 40 weeks, 1 day ago
Reviewed
0|i05yt3:
ZooKeeper ZOOKEEPER-1067

the doxygen doc should be generated as part of the release

Improvement Open Major Unresolved Unassigned Benjamin Reed Benjamin Reed 23/May/11 12:54   05/Feb/20 07:16     3.7.0, 3.5.8     0 1   currently our releases generate the javadoc as part of the documentation. we should also generate the doxygen for the c api. 2436 No Perforce job exists for this issue. 0 42066
8 years, 36 weeks, 5 days ago 0|i07kd3:
ZooKeeper ZOOKEEPER-1066

Multi should have an async version

Bug Open Major Unresolved Unassigned Ted Dunning Ted Dunning 21/May/11 17:09   10/Oct/13 13:24       c client   1 2   per the code review on ZOOKEEPER-965 it seems that multi should have an asynchronous version.

The semantics should be essentially identical. The only difference is that the original caller shouldn't wait for the result. Cloning existing multi-operations should be a decent implementation strategy.
2437 No Perforce job exists for this issue. 0 32739
6 years, 24 weeks ago 0|i05ytb:
ZooKeeper ZOOKEEPER-1065

Possible timing issue in embedded server

Bug Resolved Major Invalid Unassigned Gunnar Wagenknecht Gunnar Wagenknecht 20/May/11 02:46   20/May/11 14:39 20/May/11 14:30 3.3.3   java client, server   0 0   Windows 7, 32bit, Core2 Duo T9300, JDK 1.6.0_24, ZooKeeper data on 500GB hybrid Seagate HDD with 4GB SSD cache I have an application that uses ZooKeeper. There is an ensemble in
production. But in order to simplify development the application will
start an embedded ZooKeeper server when started in development mode. We
are experiencing a timing issue with ZooKeeper 3.3.3 and I was wondering
if this is allowed to be happen or if we did something wrong when
starting the embedded server.


Basically, we have a watch registered using an #exists call and watch
code like the following.
{code}
@Override
public void process(final WatchedEvent event) {
switch (event.getType()) {
...
case NodeCreated:
pathCreated(event.getPath());
break;
...
}
}

@Override
protected void pathCreated(final String path) {
// process events only for this node
if (!isMyPath(path))
return;
try {
loadNode(); // calls zk.getData(String, Watcher, Stat)
} catch (final Exception e) {
// got NoNodeException here (but not when debugging)
log(..., e)
}
}
{code}


From inspecting the logs we noticed a NoNodeException. When setting
breakpoints on #loadNode and stepping through we don't get the
exception. But when setting a breakpoint on #log only we got a hit and
could confirm the issue this way.

The path is actually some levels deep. All the parent paths don't exist
either so they are created as well. However, no exception is thrown fro
them. The sequence is as follows.

{noformat}
/l1 --> watch triggered, getData, no exception
/l1/l2 --> watch triggered, getData, no exception
/l1/l2/l3 --> watch triggered, getData, no exception
/l1/l2/l3/l4 --> watch triggered, getData, no exception
/l1/l2/l3/l4/l5 --> watch triggered, getData, no exception
/l1/l2/l3/l4/l5/l6 --> watch triggered, getData, NoNodeException
{noformat}

The only difference is that all paths up to including l5 do not actually
have any data. Only l6 has some data. Could there be some latency issues?

For completeness, the embedded server is started as follows.
{code}
// disable LOG4J JMX stuff
System.setProperty("zookeeper.jmx.log4j.disable", Boolean.TRUE.toString());

// get directories
final File dataDir = new File(config.getDataLogDir());
final File snapDir = new File(config.getDataDir());

// clean old logs
PurgeTxnLog.purge(dataDir, snapDir, 3);

// create standalone server
zkServer = new ZooKeeperServer();
zkServer.setTxnLogFactory(new FileTxnSnapLog(dataDir, snapDir));
zkServer.setTickTime(config.getTickTime());
zkServer.setMinSessionTimeout(config.getMinSessionTimeout());
zkServer.setMaxSessionTimeout(config.getMaxSessionTimeout());

factory = new NIOServerCnxn.Factory(config.getClientPortAddress(),
config.getMaxClientCnxns());

// start server
LOG.info("Starting ZooKeeper standalone server.");
try {
factory.startup(zkServer);
} catch (final InterruptedException e) {
LOG.warn("Interrupted during server start.", e);
Thread.currentThread().interrupt();
}
{code}
214216 No Perforce job exists for this issue. 1 32740
8 years, 44 weeks, 6 days ago 0|i05ytj:
ZooKeeper ZOOKEEPER-1064

Startup script needs more LSB compatability

Bug Resolved Major Implemented Unassigned Ted Dunning Ted Dunning 18/May/11 20:36   10/Oct/13 13:25 10/Oct/13 13:25 3.3.2       0 2   The zkServer.sh script kind of sort of implements the standard init.d style of interaction.

It lacks

- nice return codes

- status method

- standard output messages

See

http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

and

http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html

and

http://wiki.debian.org/LSBInitScripts

It is an open question how much zkServer should use these LSB scripts because that may impair portability. I
think it should produce similar messages, however, and should return standardized error codes. If lsb functions
are available, I think that they should be used so that ZK works as a first class citizen.


I will produce a proposed patch.
2439 No Perforce job exists for this issue. 0 32741
6 years, 24 weeks ago 0|i05ytr:
ZooKeeper ZOOKEEPER-1063

Dubious synchronization in Zookeeper and ClientCnxnSocketNIO classes

Bug Closed Critical Fixed Yanick Dufresne Yanick Dufresne Yanick Dufresne 17/May/11 17:34   23/Nov/11 14:22 15/Jul/11 00:11   3.4.0 java client   0 2   Synchronization around dataWatches, existWatches and childWatches in Zookeeper is incorrect.
Synchronization around outgoingQueue and pendingQueue in ClientCnxnSocketNIO is incorrect.
Synchronization around selector and key sets in ClientCnxnSocketNIO seems odd.
47494 No Perforce job exists for this issue. 3 32742
8 years, 36 weeks, 6 days ago 0|i05ytz:
ZooKeeper ZOOKEEPER-1062

Net-ZooKeeper: Net::ZooKeeper consumes 100% cpu on wait

Bug Resolved Major Fixed Botond Hejj Patrick D. Hunt Patrick D. Hunt 13/May/11 01:50   20/May/14 07:09 16/May/14 18:33 3.3.1, 3.4.5, 3.4.6 3.4.7, 3.5.0 contrib-bindings   0 5   Reported by a user on the CDH user list (user reports that the listed fix addressed this issue for him):

"Net::ZooKeeper consumes 100% cpu when "wait" is used. At my initial inspection, it seems to be related to implementation mistake in pthread_cond_timedwait."

https://rt.cpan.org/Public/Bug/Display.html?id=61290
patch 2440 No Perforce job exists for this issue. 2 32743
5 years, 44 weeks, 2 days ago Cosmetic fixes to the patch 0|i05yu7:
ZooKeeper ZOOKEEPER-1061

Zookeeper stop fails if start called twice

Bug Closed Major Fixed Ted Dunning Ted Dunning Ted Dunning 10/May/11 16:38   30/Mar/17 10:27 16/May/11 13:12 3.3.2 3.4.0 scripts   0 4   The zkServer.sh script doesn't check properly to see if a previously started
server is still running. If you call start twice, the second invocation
will over-write the PID file with a process that then fails due to port
occupancy.

This means that stop will subsequently fail.

Here is a reference that describes how init scripts should normally work:

http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

37453 No Perforce job exists for this issue. 1 32744
2 years, 51 weeks ago
Reviewed
0|i05yuf:
ZooKeeper ZOOKEEPER-1060

QuorumPeer takes a long time to shutdown

Bug Closed Minor Fixed Vishal Kher Vishal Kher Vishal Kher 10/May/11 15:32   23/Nov/11 14:22 14/Jun/11 08:14 3.4.0 3.4.0 quorum   0 2   This problem is seen only if you have ZooKeeper embedded in your application. QuorumPeerMain.initializeAndRun() does a quorumPeer.join() before exiting.

QuorumPeer.shutdown() tries to cleanup everything, but it does not interrupt itself. As a result, a if the peer is running FLE, it might be waiting to receive notifications (recvqueue.poll()) in FastLeaderElection. Therefore, quorumPeer.join() will wait until the peer wakes up from poll().

The fix is simple - call this.interrupt() in QuorumPeer.shutdown().
47495 No Perforce job exists for this issue. 1 32745
8 years, 41 weeks, 2 days ago 0|i05yun:
ZooKeeper ZOOKEEPER-1059

stat command isses on non-existing node causes NPE

Bug Closed Major Fixed Bhallamudi Venkata Siva Kamesh Bhallamudi Venkata Siva Kamesh Bhallamudi Venkata Siva Kamesh 04/May/11 06:50   23/Nov/11 14:22 16/May/11 13:39   3.4.0 java client   0 1   *stat* command issues on non existing zookeeper node,causes NPE to the client.
{noformat}
[zk: localhost:2181(CONNECTED) 2] stat /invalidPath
Exception in thread "main" java.lang.NullPointerException
at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:131)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:723)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:582)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:354)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:312)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:271)

{noformat}
47496 No Perforce job exists for this issue. 1 32746
8 years, 45 weeks, 2 days ago
Reviewed
0|i05yuv:
ZooKeeper ZOOKEEPER-1058

fix typo in opToString for getData

Bug Closed Trivial Fixed Camille Fournier Camille Fournier Camille Fournier 03/May/11 20:34   23/Nov/11 14:22 20/May/11 17:42   3.4.0     0 1   fix Request getData to print that instead of getDate 47497 No Perforce job exists for this issue. 1 32747
8 years, 44 weeks, 5 days ago Committed revision 1125544
Reviewed
0|i05yv3:
ZooKeeper ZOOKEEPER-1057

zookeeper c-client, connection to offline server fails to successfully fallback to second zk host

Bug Closed Blocker Fixed Michi Mutsuzaki Woody Anderson Woody Anderson 02/May/11 21:16   13/Mar/14 14:17 09/Jan/14 16:04 3.3.1, 3.3.2, 3.3.3 3.4.6, 3.5.0 c client   2 8   snowdutyrise-lm ~/-> uname -a
Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386

also observed on:
2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
Hello, I'm a contributor for the node.js zookeeper module: https://github.com/yfinkelstein/node-zookeeper
i'm using zk 3.3.3 for the purposes of this issue, but i have validated it fails on 3.3.1 and 3.3.2

i'm having an issue when trying to connect when one of my zookeeper servers is offline.
if the first server attempted is online, all is good.

if the offline server is attempted first, then the client is never able to connect to _any_ server.
inside zookeeper.c a connection loss (-4) is received, the socket is closed and buffers are cleaned up, it then attempts the next server in the list, creates a new socket (which gets the same fd as the previously closed socket) and connecting fails, and it continues to fail seemingly forever.
The nature of this "fail" is not that it gets -4 connection loss errors, but that zookeeper_interest doesn't find anything going on on the socket before the user provided timeout kicks things out. I don't want to have to wait 5 minutes, even if i could make myself.

this is the message that follows the connection loss:
2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection timed out (exceeded timeout by 3ms)
2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest returned error: -7 - operation timeout

While investigating, i decided to comment out close(zh->fd) in handle_error (zookeeper.c#1153)
now everything works (obviously i'm leaking an fd). Connection the the second host works immediately.
this is the behavior i'm looking for, though i clearly don't want to leak the fd, so i'm wondering why the fd re-use is causing this issue.
close() is not returning an error (i checked even though current code assumes success).

i'm on osx 10.6.7
i tried adding a setsockopt so_linger (though i didn't want that to be a solution), it didn't work.

full debug traces are included in issue here: https://github.com/yfinkelstein/node-zookeeper/issues/6
2441 No Perforce job exists for this issue. 7 32748
6 years, 2 weeks ago 0|i05yvb:
ZooKeeper ZOOKEEPER-1056

Questions and Improvements for the C client codebase

Bug Open Minor Unresolved Unassigned Stephen Tyree Stephen Tyree 26/Apr/11 23:20   26/Apr/11 23:20   3.4.0   c client   0 0   Having been using the C client for a few months now, I thought I'd look through the code and see if anything could be improved and/or fixed in order to be a good citizen. Here are some observations and questions I was hoping people could elaborate on.

- There appears to be a bug in sub_string (zookeeper.c). The third argument being passed into strncmp is a conditional due to misplaced parenthesis, meaning the length is either 0 or 1. This likely leads to many, many false positives of chroots matching paths.
- There appears to be a bug in queue_session_event, where we check for cptr->buffer not being NULL after already dereferencing it
- In both queue_buffer and queue_completion_nolock, we assert a conditional that we just checked for
- What is the policy on whether the result of memory allocations are checked for, assert'd against or ignored? This is done inconsistently.
- What is the policy on whether pointers are checked/set against NULL versus 0? This is done inconsistently.
- Some functions, such as zoo_wget_children2_, exhibit needlessly high cyclomatic complexity
- What is the policy on line length restrictions? Some functions go through hurdles to enforce 80 characters while others do no such thing.
- What is the policy on indentation and spacing of if statements and blocks of code? This is done inconsistently.

If any or all of these turn out to be issues that need to be fixed I'd be more than happy to do so.
2442 No Perforce job exists for this issue. 0 32749
8 years, 48 weeks, 1 day ago 0|i05yvj:
ZooKeeper ZOOKEEPER-1055

check for duplicate ACLs in addACL() and create()

Bug Closed Major Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 26/Apr/11 17:08   23/Nov/11 14:22 14/Aug/11 20:35 3.4.0 3.4.0     0 1   actual result:


[zk: (CONNECTED) 0] create /test2 'test2' digest:test:test:cdrwa,digest:test:test:cdrwa
Created /test2
[zk: (CONNECTED) 1] getAcl /test2
'digest,'test:test
: cdrwa
'digest,'test:test
: cdrwa
[zk: (CONNECTED) 2]

but getAcl should only have a single entry.
47498 No Perforce job exists for this issue. 6 32750
8 years, 32 weeks, 3 days ago refresh against trunk.
Reviewed
0|i05yvr:
ZooKeeper ZOOKEEPER-1054

Drop connections from servers not in the cluster configuration

Improvement Open Minor Unresolved Bhallamudi Venkata Siva Kamesh Bhallamudi Venkata Siva Kamesh Bhallamudi Venkata Siva Kamesh 26/Apr/11 04:29   05/Feb/20 07:16     3.7.0, 3.5.8 leaderElection   0 2   Let us suppose zookeeper cluster is running in the following machines

{noformat}
server.1=10.18.52.133:2999:3999
server.2=10.18.52.253:2999:3999
server.3=10.18.52.96:2999:3999
{noformat}


Let us take another zookeeper(10.18.52.109),which is not part of the cluster configuration, tries to participate in the leader election,then one of the zookeeper server's log is filled with following INFO messages

{noformat}
2011-04-19 17:42:42,457 - INFO [/10.18.52.133:3999:QuorumCnxManager$Listener@486] - Received connection request /10.18.52.109:18324
{noformat}
security 34 No Perforce job exists for this issue. 5 711
6 years, 2 weeks, 1 day ago 0|i00h5b:
ZooKeeper ZOOKEEPER-1053

PurgeTxnLog only take relative path

Improvement Resolved Major Invalid Unassigned Jun Rao Jun Rao 25/Apr/11 20:53   23/May/14 07:30 23/May/14 07:30 3.3.3   server   0 2   PurgeTxnLog only works on relative path for the data and the snapshot directory. It should support absolute paths too. 2443 No Perforce job exists for this issue. 0 42067
5 years, 43 weeks, 6 days ago 0|i07kdb:
ZooKeeper ZOOKEEPER-1052

Findbugs warning in QuorumPeer.ResponderThread.run()

Bug Closed Major Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 24/Apr/11 09:45   23/Nov/11 14:22 03/May/11 13:58 3.3.2 3.4.0     1 1   {noformat}
REC Exception is caught when Exception is not thrown in org.apache.zookeeper.server.quorum.QuorumPeer$ResponderThread.run()
{noformat}
47499 No Perforce job exists for this issue. 1 32751
8 years, 47 weeks, 1 day ago 0|i05yvz:
ZooKeeper ZOOKEEPER-1051

SIGPIPE in Zookeeper 0.3.* when send'ing after cluster disconnection

Bug Closed Minor Fixed Stephen Tyree Stephen Tyree Stephen Tyree 21/Apr/11 09:56   23/Nov/11 14:21 30/Aug/11 03:02 3.3.2, 3.3.3, 3.4.0 3.4.0 c client   1 3 7200 7200 0% In libzookeeper_mt, if your process is going rather slowly (such as when running it in Valgrind's Memcheck) or you are using gdb with breakpoints, you can occasionally get SIGPIPE when trying to send a message to the cluster. For example:

==12788==
==12788== Process terminating with default action of signal 13 (SIGPIPE)
==12788== at 0x3F5180DE91: send (in /lib64/libpthread-2.5.so)
==12788== by 0x7F060AA: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0)
==12788== by 0x7F06E5B: zookeeper_process (in /usr/lib64/libzookeeper_mt.so.2.0.0)
==12788== by 0x7F0D38E: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0)
==12788== by 0x3F5180673C: start_thread (in /lib64/libpthread-2.5.so)
==12788== by 0x3F50CD3F6C: clone (in /lib64/libc-2.5.so)
==12788==

This is probably not the behavior we would like, since we handle server disconnections after a failed call to send. To fix this, there are a few options we could use. For BSD environments, we can tell a socket to never send SIGPIPE with send using setsockopt:

setsockopt(sd, SOL_SOCKET, SO_NOSIGPIPE, (void *)&set, sizeof(int));

For Linux environments, we can add a MSG_NOSIGNAL flag to every send call that says to not send SIGPIPE on a bad file descriptor.

For more information, see: http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly
0% 0% 7200 7200 47500 No Perforce job exists for this issue. 2 32752
8 years, 30 weeks, 2 days ago Add flag to socket send on Linux that prevents SIGPIPE from being fired should the Zookeeper cluster close the connection on its side.
Reviewed
0|i05yw7:
ZooKeeper ZOOKEEPER-1050

zooinspector shell scripts do not work

Bug Resolved Trivial Fixed Will Johnson Chris Burroughs Chris Burroughs 20/Apr/11 20:46   06/Jan/12 05:57 05/Jan/12 20:23 3.3.2 3.5.0 contrib   0 3   * zooInspector-dev.sh uses DOS line endings. Dash at least chokes on this.
* zooInspector.sh has an errant ; in the classpath.

Also there really isn't a reason to hard code the zookeeper version needed in lib. Just use a glob.
zooinspector 2444 No Perforce job exists for this issue. 2 32753
8 years, 11 weeks, 6 days ago
Reviewed
0|i05ywf:
ZooKeeper ZOOKEEPER-1049

Session expire/close flooding renders heartbeats to delay significantly

Bug Closed Critical Fixed Chang Song Chang Song Chang Song 15/Apr/11 23:42   23/Nov/11 14:22 03/May/11 17:30 3.3.2 3.3.4, 3.4.0 server   0 6   CentOS 5.3, three node ZK ensemble Let's say we have 100 clients (group A) already connected to three-node ZK ensemble with session timeout of 15 second. And we have 1000 clients (group B) already connected to the same ZK ensemble, all watching several nodes (with 15 second session timeout)

Consider a case in which All clients in group B suddenly hung or deadlocked (JVM OOME) all at the same time. 15 seconds later, all sessions in group B gets expired, creating session closing stampede. Depending on the number of this clients in group B, all request/response ZK ensemble should process get delayed up to 8 seconds (1000 clients we have tested).

This delay causes some clients in group A their sessions expired due to delay in getting heartbeat response. This causes normal servers to drop out of clusters. This is a serious problem in our installation, since some of our services running batch servers or CI servers creating the same scenario as above almost everyday.

I am attaching a graph showing ping response time delay.

I think ordering of creating/closing sessions and ping exchange isn't important (quorum state machine). at least ping request / response should be handle independently (different queue and different thread) to keep realtime-ness of ping.

As a workaround, we are raising session timeout to 50 seconds.
But this causes max. failover of cluster to significantly increased, thus initial QoS we promised cannot be met.







47501 No Perforce job exists for this issue. 3 32754
8 years, 24 weeks, 1 day ago 0|i05ywn:
ZooKeeper ZOOKEEPER-1048

addauth command does not work in cli_mt/cli_st

Bug Resolved Major Fixed allengao allengao allengao 13/Apr/11 05:40   02/Mar/16 20:36 05/May/12 23:47 3.3.1 3.3.6, 3.4.4, 3.5.0 c client   0 3 604800 604800 0% SUSE_64 I can not operation a node with ACL by "addauth" when using cli_st. I have fixed this bug:
original:else if (startsWith(line, "addauth ")) {
char *ptr;
line += 8;
ptr = strchr(line, ' ');
if (ptr) {
*ptr = '\0';
ptr++;
}
zoo_add_auth(zh, line, ptr, ptr ? strlen(ptr) -1 : 0, NULL, NULL);
now: zoo_add_auth(zh, line, ptr, ptr ? strlen(ptr) : 0, NULL, NULL);
strlen(ptr) is just ok.
0% 0% 604800 604800 patch 2445 No Perforce job exists for this issue. 0 32755
7 years, 46 weeks, 4 days ago addauth 0|i05ywv:
ZooKeeper ZOOKEEPER-1047

ZooKeeper Standalone does not shutdown cleanly

Bug Open Major Unresolved Unassigned Gunnar Wagenknecht Gunnar Wagenknecht 13/Apr/11 05:04   13/Apr/11 05:04   3.3.3   server   0 1   When I shutdown a standalone ZooKeeper server (programmatically) I get the following exception logged. Occasionally, no exception is logged.
{noformat}
10:32:43.353 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] WARN o.a.zookeeper.server.NIOServerCnxn - Ignoring unexpected runtime exception
java.nio.channels.CancelledKeyException: null
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) ~[na:1.6.0_24]
at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69) ~[na:1.6.0_24]
at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:241) ~[na:na]
10:32:43.353 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] INFO o.a.zookeeper.server.NIOServerCnxn - NIOServerCnxn factory exited run method
10:32:43.387 [SyncThread:0] INFO o.a.z.server.SyncRequestProcessor - SyncRequestProcessor exited!
10:32:43.387 [ProcessThread:-1] INFO o.a.z.server.PrepRequestProcessor - PrepRequestProcessor exited loop!
10:32:43.387 [app thread] INFO o.a.z.server.FinalRequestProcessor - shutdown of request processor complete
{noformat}

Because it's logged with a WARN level, my assumption is that something is wrong on shutdown. However, I follow the exact same shutdown order than ZooKeeperMain, i.e. shutdown the {{NIOServerCnxn.Factory}} first and shutdown the {{ZooKeeperServer}} instance thereafter if its still running.

{noformat}
...
factory.shutdown();
factory = null;

if (zkServer.isRunning()) {
zkServer.shutdown();
}
zkServer = null;
{noformat}


2446 No Perforce job exists for this issue. 0 32756
8 years, 50 weeks, 1 day ago 0|i05yx3:
ZooKeeper ZOOKEEPER-1046

Creating a new sequential node results in a ZNODEEXISTS error

Bug Closed Blocker Fixed Vishal Kher Jeremy Stribling Jeremy Stribling 12/Apr/11 18:24   23/Nov/11 14:22 14/Jul/11 10:24 3.3.2, 3.3.3 3.3.4, 3.4.0 server   2 3   A 3 node-cluster running Debian squeeze. On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible. In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent. In one example:

{noformat}
[zk:<ip:port>(CONNECTED) 3] stat /zkrsm
cZxid = 0x5
ctime = Mon Jan 17 18:28:19 PST 2011
mZxid = 0x5
mtime = Mon Jan 17 18:28:19 PST 2011
pZxid = 0x1d819
cversion = 120710
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 2955
{noformat}

However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk.

In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA. The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart.

Here's the scenario:

# There's a cluster with nodes 1,2,3 using zxid 0x3.
# All three nodes restart, forming a cluster of zxid 0x4.
# Node 3 restarts, leading to a cluster of 0x5.

At this point, it seems like node 1 is the leader of the 0x5 epoch. In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message:

{noformat}
2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440
{noformat}

This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries. The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play.

I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc. Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system.

Some oddities you might notice in these logs:
* Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code. (They are assigned randomly, but are supposed to be consistent across restarts.)
* We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process). This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins:
{noformat}
2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner - shutdown called
{noformat}
* There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed. I'm not sure how this would be related though, as no transactions are done during this period.
sequence 47502 No Perforce job exists for this issue. 10 32757
8 years, 21 weeks, 6 days ago sequential znodeexists 0|i05yxb:
ZooKeeper ZOOKEEPER-1045

Support Quorum Peer mutual authentication via SASL

New Feature Closed Critical Fixed Rakesh Radhakrishnan Eugene Joseph Koontz Eugene Joseph Koontz 06/Apr/11 18:01   14/Jul/19 12:39 05/Dec/16 19:20   3.4.10 quorum, security   2 31   ZOOKEEPER-938 addresses mutual authentication between clients and servers. This bug, on the other hand, is for authentication among quorum peers. Hopefully much of the work done on SASL integration with Zookeeper for ZOOKEEPER-938 can be used as a foundation for this enhancement.

Review board: https://reviews.apache.org/r/47354/
2447 No Perforce job exists for this issue. 29 42068
35 weeks, 4 days ago
Reviewed
0|i07kdj:
ZooKeeper ZOOKEEPER-1044

ZOOKEEPER-107 Allow dynamic changes to roles of a peer

Sub-task Resolved Major Fixed Alexander Shraer Vishal Kher Vishal Kher 04/Apr/11 14:27   13/Jun/16 10:20 23/May/14 14:14 3.3.0 3.5.0 quorum   2 10   Requirement: functionality that will reconfigure
a OBSERVER to become a voting member and vice versa.

Example of usage:

1. Maintain the Quorum size without changing the cluster size - in a 5
node cluster with 2 observers, I decide to decommission a voting
member. Then, I would like to configure one of my observers to be a
follower without any down time.

2. Added a new server to the cluster that has better resources than
one of the voting peers. Make the new node as voting peer and the old
one as observer.

3. Reduce the size of voting member for performance reasons.

Fix to ZOOKEEPER-107 might automatically give us this functionality.
It will be good to confirm that, and if needed, highlight work
that might be needed in addition to ZOOKEEPER-107.

2448 No Perforce job exists for this issue. 0 42069
3 years, 40 weeks, 3 days ago 0|i07kdr:
ZooKeeper ZOOKEEPER-1043

Looped NPE at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)

Bug Open Major Unresolved Unassigned César Álvarez Núñez César Álvarez Núñez 04/Apr/11 11:34   28/Aug/15 16:03   3.3.3, 3.4.6       2 8   Sparc Solaris 10 and 11
Java 6u17 64 bits
5 nodes ensemble
I'm sorry but I only have this log (which belongs to a "follower" node) and a previous message [Unexpected NodeCreated event after a reconnection.|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201103.mbox/%3CAANLkTi=vmZ5v4W6FMhWg4XO6rJT89eGozGUE840bku0_@mail.gmail.com%3E] where I describe a potential side-effect at client side.

{noformat}
2011-04-04 09:31:09,608 - INFO [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 1700527e36
2011-04-04 09:31:09,653 - INFO [SyncThread:1:FileTxnLog@197][] - Creating new log file: log.1700527e38
2011-04-04 10:13:39,287 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.69:1093
2011-04-04 10:13:39,371 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@777][] - Client attempting to establish new session at /XXX.XXX.XXX.69:1093
2011-04-04 10:13:39,376 - INFO [CommitProcessor:1:NIOServerCnxn@1580][] - Established session 0x12ee79c4a720022 with negotiated timeout 20000 for client /XXX.XXX.XXX.69:1093
2011-04-04 12:04:11,131 - INFO [SyncThread:1:FileTxnLog@197][] - Creating new log file: log.170053bf15
2011-04-04 12:04:11,131 - INFO [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 170053bf17
2011-04-04 12:13:10,779 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.63:1817
2011-04-04 12:13:10,790 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@777][] - Client attempting to establish new session at /XXX.XXX.XXX.63:1817
2011-04-04 12:13:10,794 - INFO [CommitProcessor:1:NIOServerCnxn@1580][] - Established session 0x12ee79c4a720023 with negotiated timeout 20000 for client /XXX.XXX.XXX.63:1817
2011-04-04 12:13:10,814 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@634][] - EndOfStreamException: Unable to read additional data from client sessionid 0x12ee79c4a720023, likely client has closed socket
2011-04-04 12:13:10,816 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@1435][] - Closed socket connection for client /XXX.XXX.XXX.63:1817 which had sessionid 0x12ee79c4a720023
2011-04-04 12:13:10,839 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.63:1814
2011-04-04 12:13:10,840 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@274][] - Ignoring exception
java.net.SocketException: Invalid argument
at sun.nio.ch.Net.setIntOption0(Native Method)
at sun.nio.ch.Net.setIntOption(Unknown Source)
at sun.nio.ch.SocketChannelImpl$1.setInt(Unknown Source)
at sun.nio.ch.SocketOptsImpl.setBoolean(Unknown Source)
at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(Unknown Source)
at sun.nio.ch.OptionAdaptor.setTcpNoDelay(Unknown Source)
at sun.nio.ch.SocketAdaptor.setTcpNoDelay(Unknown Source)
at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:1367)
at org.apache.zookeeper.server.NIOServerCnxn$Factory.createConnection(NIOServerCnxn.java:215)
at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:256)
2011-04-04 12:13:10,841 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
java.lang.NullPointerException
at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
2011-04-04 12:13:10,841 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
java.lang.NullPointerException
at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
2011-04-04 12:13:10,842 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
java.lang.NullPointerException
at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
...
...
...
2011-04-04 16:49:23,101 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception
java.lang.NullPointerException
at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244)
{noformat}
2449 No Perforce job exists for this issue. 3 32758
4 years, 29 weeks, 6 days ago 0|i05yxj:
ZooKeeper ZOOKEEPER-1042

ZOOKEEPER-1037 Generate zookeeper test jar for maven installation

Sub-task Closed Major Fixed Ivan Kelly Ivan Kelly Ivan Kelly 31/Mar/11 11:37   23/Nov/11 14:22 01/Apr/11 13:22   3.4.0 contrib-bookkeeper, contrib-hedwig   0 2   Bookkeeper and hedwig both need access to zookeeper test classes. This JIRA is to provide that. 47503 No Perforce job exists for this issue. 4 33339
8 years, 51 weeks, 5 days ago
Reviewed
0|i062in:
ZooKeeper ZOOKEEPER-1041

ZOOKEEPER-1037 get hudson running on bookkeeper

Sub-task Resolved Major Implemented Unassigned Benjamin Reed Benjamin Reed 30/Mar/11 01:32   08/Oct/13 18:45 08/Oct/13 18:45     contrib-bookkeeper, contrib-hedwig   0 0   setup hudson to run on bookkeeper code 2450 No Perforce job exists for this issue. 0 42070
9 years, 1 day ago 0|i07kdz:
ZooKeeper ZOOKEEPER-1040

ZOOKEEPER-1037 create bookkeeper webpage

Sub-task Resolved Major Fixed Unassigned Benjamin Reed Benjamin Reed 30/Mar/11 01:31   28/Apr/11 19:09 28/Apr/11 19:09     contrib-bookkeeper, contrib-hedwig   0 0   create a webpage for bookkeeper 47504 No Perforce job exists for this issue. 0 33340
9 years, 1 day ago 0|i062iv:
ZooKeeper ZOOKEEPER-1039

ZOOKEEPER-1037 give bookkeeper committers access to bookkeeper svn

Sub-task Resolved Major Fixed Unassigned Benjamin Reed Benjamin Reed 30/Mar/11 01:30   28/Apr/11 19:09 28/Apr/11 19:09     contrib-bookkeeper, contrib-hedwig   0 0   need to give ivan, utkarsh, and dhruba svn access to bookkeeper svn 47505 No Perforce job exists for this issue. 0 33341
9 years, 1 day ago 0|i062j3:
ZooKeeper ZOOKEEPER-1038

ZOOKEEPER-1037 Move bookkeeper and hedwig code in subversion

Sub-task Resolved Major Fixed Unassigned Benjamin Reed Benjamin Reed 30/Mar/11 01:28   05/Apr/11 16:05 05/Apr/11 16:05     contrib-bookkeeper, contrib-hedwig   0 0   need to do an svn move of the hedwig and bookkeeper code to the bookkeeper subversion 47506 No Perforce job exists for this issue. 0 33342
8 years, 51 weeks, 6 days ago 0|i062jb:
ZooKeeper ZOOKEEPER-1037

Create BookKeeper subproject

Task Resolved Major Implemented Unassigned Benjamin Reed Benjamin Reed 30/Mar/11 01:27   08/Oct/13 18:45 08/Oct/13 18:45     contrib-bookkeeper, contrib-hedwig   0 0   ZOOKEEPER-1038, ZOOKEEPER-1039, ZOOKEEPER-1040, ZOOKEEPER-1041, ZOOKEEPER-1042 move the hedwig and bookkeeper code to the bookkeeper subproject 2451 No Perforce job exists for this issue. 0 42071
9 years, 1 day ago 0|i07ke7:
ZooKeeper ZOOKEEPER-1036

send UPTODATE to follower until a quorum of servers synced with leader

Bug Resolved Major Not A Problem Unassigned jiangwen wei jiangwen wei 28/Mar/11 20:46   31/Mar/11 16:26 31/Mar/11 16:26     server   0 0   1. current process
when leader fail, a new leader will be elected, followers will sync with the
new leader.
After synced, leader send UPTODATE to follower.

2. a corner case
but there is a corner case, things will go wrong.
suppose message M only exists on leader, after a follower synced with
leader, the client connected to the follower will see M.
but it only exists on two servers, not on a quorum of servers. If the new
leader and the follower failed, message M is lost, but M is already seen by
client.

3. one solution
So I think UPTODATE can be sent to follower only when a quorum of server
synced with the leader.
214215 No Perforce job exists for this issue. 0 32759
9 years ago 0|i05yxr:
ZooKeeper ZOOKEEPER-1035

CREATOR_ALL_ACL does not work together with IPAuthenticationProvider

Bug Open Major Unresolved Unassigned Christian Ziech Christian Ziech 28/Mar/11 04:36   07/Sep/11 10:49   3.3.1, 3.3.2   server   0 2   We were trying to use the predefined ACL "Ids.CREATOR_ALL_ACL" together with the default ip authentication. Unfortunately it seems that this cannot work due to the implementation of the PrepRequestProcessor.fixupACL() method checking the return value of the AuthenticationProvider.isAuthenticated() (the IPAuthenticationProvider in our case) method.
Unfortunately this provider always returns false which results in the Ids.CREATOR_ALL_ACL to be always rejected.
2452 No Perforce job exists for this issue. 0 32760
9 years, 3 days ago 0|i05yxz:
ZooKeeper ZOOKEEPER-1034

perl bindings should automatically find the zookeeper c-client headers

Bug Closed Minor Fixed Nicholas Harteau Nicholas Harteau Nicholas Harteau 27/Mar/11 16:45   23/Nov/11 14:22 14/Aug/11 21:41 3.3.3 3.4.0 contrib   0 2   Installing Net::ZooKeeper from cpan or the zookeeper distribution tarballs will always fail due to not finding c-client header files. In conjunction with ZOOKEEPER-1033 update perl bindings to look for c-client header files in INCDIR/zookeeper/

a.k.a. make installs of Net::ZooKeeper via cpan/cpanm/whatever *just work*, assuming you've already got the zookeeper c client installed.
47507 No Perforce job exists for this issue. 4 32761
8 years, 32 weeks, 3 days ago Net::ZooKeeper now looks in some sane places for the c client includes
Reviewed
0|i05yy7:
ZooKeeper ZOOKEEPER-1033

c client should install includes into INCDIR/zookeeper, not INCDIR/c-client-src

Bug Closed Minor Fixed Nicholas Harteau Nicholas Harteau Nicholas Harteau 27/Mar/11 16:40   23/Nov/11 14:22 04/May/11 02:03 3.3.3 3.4.0 c client   0 2   header files are installed into foo/include/c-client-src/, which doesn't indicate a relationship with zookeeper and doesn't correspond to foo/lib/libzookeeper*

header files should be installed into foo/include/zookeeper/ as this is the common practice.
47508 No Perforce job exists for this issue. 1 32762
8 years, 47 weeks, 1 day ago Install c-client header files into include/zookeeper/ rather than include/c-client-src/ 0|i05yyf:
ZooKeeper ZOOKEEPER-1032

speed up recovery from leader failure

Improvement Open Major Unresolved Unassigned jiangwen wei jiangwen wei 27/Mar/11 06:03   05/Feb/20 07:16     3.7.0, 3.5.8 server   1 4   when the number of nodes is large, it may take a long time to recover from leader failure
there are some points to improve:

1. Follower should take snapshot asynchronously when follower up to date

2. Currently Leader/Follower will clear the DataTree on leader failures, and then restore it from a snapshot and transaction logs. DataTree should not be cleared, only restore it from transaction logs.

3. FileTxnLog should store recently transaction logs in memory, so when DataTree is not behind the transaction logs a lot, the transaction logs in memory can be used to restore DataTree.
2453 No Perforce job exists for this issue. 0 42072
9 years, 3 days ago 0|i07kef:
ZooKeeper ZOOKEEPER-1031

Introduce virtual cluster IP and start that cluster IP on the host running ZK leader

Wish Open Minor Unresolved Unassigned Vishal Kher Vishal Kher 25/Mar/11 22:29   28/May/15 01:36   3.3.3 4.0.0 leaderElection, quorum   2 6   It would be useful to enable a way to specify a virtual (floating) IP for the ZK cluster (say in zoo.cfg). The ZK leader will start this IP on one of its interfaces. If the leadership changes, the cluster IP will be taken over by the new leader. This IP can be used to identify the ZK leader and send administrative commands/query to the leader. For example,
- a ZK client can get the list of ZK servers in the configuration by sending a request to the server running this IP address. The client just needs to know one IP address. Availability of cluster automatically ensures availability of
the IP address.
- To reconfigure ZK configuration, a client can send reconfig request to the server on this IP and keep retrying until the request succeeds or fails.

Implementation issues:
1. The old ZK leader that has lost leadership should be able to somehow give up the virtual IP address. Otherwise, it could lead to collisions. One solution is to self reboot. A system property can be used to specify ways to unplumb the cluster IP
2. Cross-platform support.
3. Refreshing ARP caches
2454 No Perforce job exists for this issue. 0 42073
4 years, 43 weeks ago 0|i07ken:
ZooKeeper ZOOKEEPER-1030

Increase default for maxClientCnxns

Improvement Closed Trivial Fixed Todd Lipcon Todd Lipcon Todd Lipcon 25/Mar/11 18:37   04/Sep/14 21:26 08/Apr/11 19:41 3.2.2 3.4.0     0 3   The default for maxClientCnxns is 10, which is too low for many applications. For example, HBase users often run MR jobs where each task needs to use ZooKeeper to talk to HBase. This means that each slot on the tasktracker will have at least one ZK connection. With today's beefy machines, that's easily 20+ connections per node.

I would suggest bumping the default to 60, which will still protect against runaway nodes (eg a leak in a tight loop) but won't impact MR jobs that need to talk to ZK.
37454 No Perforce job exists for this issue. 2 30003
5 years, 28 weeks, 6 days ago
Reviewed
0|i05hxr:
ZooKeeper ZOOKEEPER-1029

C client bug in zookeeper_init (if bad hostname is given)

Bug Closed Blocker Fixed Flavio Paiva Junqueira Dheeraj Agrawal Dheeraj Agrawal 25/Mar/11 16:47   25/Dec/18 04:42 11/Dec/15 15:15 3.3.2, 3.4.6, 3.5.0 3.4.7, 3.5.2, 3.6.0 c client   3 18   If you give invalid hostname to zookeeper_init method, it's not able to resolve it, and it tries to do the cleanup (free buffer/completion lists/etc) . The adaptor_init() is not called for this code path, so the lock,cond variables (for adaptor, completion lists) are not initialized.

As part of the cleanup it's trying to clean up some buffers and acquires locks and unlocks (where the locks have not yet been initialized, so unlocking fails)
lock_completion_list(&zh->sent_requests); - pthread_mutex/cond not initialized
tmp_list = zh->sent_requests;
zh->sent_requests.head = 0;
zh->sent_requests.last = 0;
unlock_completion_list(&zh->sent_requests);  trying to broadcast here on uninitialized cond

It should do error checking to see if locking succeeds before unlocking it. If Locking fails, then appropriate error handling has to be done.
2455 No Perforce job exists for this issue. 8 32763
1 year, 12 weeks, 2 days ago
Reviewed
c client, adaptor_init, zookeeper_init, bad hostname 0|i05yyn:
ZooKeeper ZOOKEEPER-1028

In python bindings, zookeeper.set2() should return a stat dict but instead returns None

Bug Closed Minor Fixed Chris Medaglia Chris Medaglia Chris Medaglia 24/Mar/11 16:26   23/Nov/11 14:22 06/Apr/11 16:22 3.3.3 3.4.0 contrib-bindings   0 3 3600 3600 0% All environments. There is a small bug in the python bindings, specifically with the zookeeper.set2() call. This method should return a stat dictionary, but actually returns None. The fix is a one-character change to zookeeper.c such that the return value is '&stat' rather than 'stat'. 0% 0% 3600 3600 patch 47509 No Perforce job exists for this issue. 2 32764
8 years, 51 weeks ago
Reviewed
0|i05yyv:
ZooKeeper ZOOKEEPER-1027

chroot not transparent in zoo_create()

Bug Closed Critical Fixed Thijs Terlouw Thijs Terlouw Thijs Terlouw 24/Mar/11 01:06   28/Sep/15 13:33 25/Jul/11 13:45 3.3.3 3.4.0 c client   0 5   ZOOKEEPER-1150 Linux, ZooKeeper 3.3.3, C-client, java 1.6.0_17-b04, hotspot server vm I've recently started to use the chroot functionality (introduced in
3.2.0) as part of my connect string.It mostly works as expected, but
there is one case that is unexpected: when I create a path with
zoo_create() I can retrieve the created path. This is very useful when
you set the ZOO_SEQUENCE flag. Unfortunately the returned path
includes the chroot as part of the path. This was unexpected to me: I
expected that the chroot would be totally transparent. The
documentation for zoo_create() says:
"path_buffer : Buffer which will be filled with the path of the new
node (this might be different than the supplied path because of the
ZOO_SEQUENCE flag)."

This gave me the impression that this flag is the only reason the
returned path is different from the created path, but apparently it's
not. Is this a bug or intended behavior?
I workaround this issue now by remembering the chroot in
my wrapper code and after a call to zoo_create() i check if the returned
path starts with the chroot. If it does, I remove it.

My use case is to create a path with a sequence number and then delete
this path later. Unfortunately I cannot delete the path because it has
the chroot prepended to it, and thus it will result in two chroots.

I believe this only affects the create functions.
47510 No Perforce job exists for this issue. 5 32765
8 years, 33 weeks, 1 day ago Correctly removes the chroot from the returned path in a call to zoo_create()
Reviewed
chroot zookeeper zoo_create 0|i05yz3:
ZooKeeper ZOOKEEPER-1026

Sequence number assignment decreases after old node rejoins cluster

Bug Open Major Unresolved Unassigned Jeremy Stribling Jeremy Stribling 22/Mar/11 13:16   25/Mar/11 14:25   3.3.3   server   0 1   I ran into a weird case where a Zookeeper server rejoins the cluster after missing several operations, and then a client creates a new sequential node that has a number earlier than the last node it created. I don't have full logs, or a live system in this state, or any data directories, just some partial server logs and the evidence as seen by the client. Haven't tried reproducing it yet, just wanted to see if anyone here had any ideas. Here's the scenario (probably more info than necessary, but trying to be complete)

1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes #1, #2, and #3 below):
2) Nodes periodically (and throughout this whole timeline) create sequential, non-ephemeral nodes under the /zkrsm parent node.
3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116
4) 5:47:06: Node #1 restarts and rejoins
5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708
6) 5:49:29: Node #2 restarts and rejoins
7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291
8) 5:52:02: Node #3 restarts and begins the rejoining process
9) 5:52:08: Node #1 successfully creates /zkrsm/0000000000000000_record0000003348
10) 5:52:08: Node #2 dies after getting notified of /zkrsm/0000000000000000_record0000003348
11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have wallclock timestamps, so not exactly sure on the ordering of this step)
12) 5:52:15: Node #1 successfully creates /zkrsm/0000000000000000_record0000003292

Note that the node created in step #12 is lower than the one created in step #9, and is exactly one greater than the last node seen by node #3 before it restarted.

Here is the sequence of session establishments as seen from the C client of node #1 after its restart (the IP address of node #1=13.0.0.11, #2=13.0.0.12, #3=13.0.0.13):

2011-03-18 05:46:59,838:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.13:2888], sessionId=0x252ec780a3020000, negotiated timeout=6000
2011-03-18 05:49:32,194:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.13:2888], sessionId=0x252ec782f5100002, negotiated timeout=6000
2011-03-18 05:52:02,352:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.12:2888], sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
2011-03-18 05:52:08,583:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.11:2888], sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000
2011-03-18 05:52:13,834:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.11:2888], sessionId=0xd72ec7856d0f0001, negotiated timeout=6000

I will attach logs for all nodes after each of their restarts, and a partial log for node #3 from before its restart.
2456 No Perforce job exists for this issue. 1 32766
9 years, 6 days ago sequential 0|i05yzb:
ZooKeeper ZOOKEEPER-1025

zkCli is overly sensitive to to spaces.

Improvement Closed Major Fixed Laxman Jonathan Hsieh Jonathan Hsieh 21/Mar/11 20:40   23/Nov/11 14:22 18/Aug/11 16:20 3.3.3, 3.4.0 3.4.0 java client   0 2   Here's an example:

I do an ls to get znode names. I try to stat a znode.
{code}
[zk: localhost:3181(CONNECTED) 1] ls /flume-nodes
[nodes0000000002, nodes0000000001, nodes0000000000, nodes0000000005, nodes0000000004, nodes0000000003]
[zk: localhost:3181(CONNECTED) 3] stat /flume-nodes/nodes0000000002
cZxid = 0xb
ctime = Sun Mar 20 23:24:03 PDT 2011
... (success)
{code}

Here's something that almost looks the same. Notice the extra space infront of the znode name.

{code}
[zk: localhost:3181(CONNECTED) 2] stat /flume-nodes/nodes0000000002
Command failed: java.lang.IllegalArgumentException: Path length must be > 0
{code}

This seems like unexpected behavior.
40928 No Perforce job exists for this issue. 2 33343
8 years, 31 weeks, 6 days ago 0|i062jj:
ZooKeeper ZOOKEEPER-1024

let path be binary

New Feature Open Major Unresolved Unassigned jiangwen wei jiangwen wei 19/Mar/11 06:20   02/May/11 23:42       server   0 1   let path be binary, not string. there are overhead to hold string.
the overhead is obvious when there are millions of nodes.

some time ZK can be used as a highly available meta database.
some data are binary, if converting to string, there is also obvious overhead.
2457 No Perforce job exists for this issue. 0 42074
8 years, 47 weeks, 2 days ago 0|i07kev:
ZooKeeper ZOOKEEPER-1023

zkpython: add_auth can deadlock the interpreter

Bug Open Minor Unresolved Unassigned Botond Hejj Botond Hejj 19/Mar/11 06:20   05/Feb/20 07:16   3.3.2 3.7.0, 3.5.8 contrib-bindings   0 0   If the add_auth method has a callback and we execute another command just after it than we can deadlock the python api.
Example:

def deadlock(a, b):
pass

def watcher(zh, type, state, path):
if(state == zookeeper.CONNECTED_STATE):
zookeeper.add_auth(zh, 'test', 'test', deadlock)
zookeeper.get_children(zh, '/')

zh = zookeeper.init("host:port", watcher)

Looking at the code the problem looks like the following:
get_children sync call is running on the main thread and have the GIL it blocks until the get_children finished. Meantime on the other thread the callback of add_auth is called and that tries to get the GIL to call the python callback. So this thread is waiting for the main thread to release the GIL but the main thread is waiting for the other thread to process the reply of get_children.

I am not an expert on python binding but I think it can be solved if the GIL would be release before synchronous c api calls.
2458 No Perforce job exists for this issue. 1 32767
9 years, 2 days ago 0|i05yzj:
ZooKeeper ZOOKEEPER-1022

let the children under a ZNode in order.

New Feature Open Major Unresolved Unassigned jiangwen wei jiangwen wei 19/Mar/11 06:13   29/Apr/11 11:34       server   2 3   let the children under a ZNode in order. and user can specify a comparator for each parent ZNode.
some time we only need get some children, not all, like getting first children.
and some application can leverage the order, like in HBase, the meta table can put into ZK.
2459 No Perforce job exists for this issue. 0 42075
9 years, 2 days ago 0|i07kf3:
ZooKeeper ZOOKEEPER-1020

Implement function in C client to determine which host you're currently connected to.

New Feature Closed Minor Fixed Stephen Tyree Stephen Tyree Stephen Tyree 15/Mar/11 10:39   23/Nov/11 14:22 16/Mar/11 21:01   3.4.0 c client   0 0   On occasion it might be useful to determine which host your Zookeeper client is currently connected to, be it for debugging purposes or otherwise. A possible signature for that function:

const char* zoo_get_connected_host(zhandle_t *zh, char *buffer, size_t buffer_size, unsigned short *port);

Clients could use it like below:

char buffer[33];
unsigned short port = 0;
if (!zoo_get_connected_host(zh, buffer, sizeof(buffer), &port))
return EXIT_FAILURE;

printf("The connected host is: %s:%d\n", buffer, port);
47511 No Perforce job exists for this issue. 1 33344
9 years, 2 weeks ago
Reviewed
0|i062jr:
ZooKeeper ZOOKEEPER-1019

zkfuse doesn't list dependency on boost in README

Improvement Closed Major Fixed Raúl Gutiérrez Segalés Karel Vervaeke Karel Vervaeke 15/Mar/11 09:48   13/Mar/14 14:17 10/Dec/13 15:45 3.4.0 3.4.6, 3.5.0 contrib   0 5 300 300 0% The README.txt under contrib/fuse doesn't list boost under Development build libraries< 0% 0% 300 300 2460 No Perforce job exists for this issue. 1 42076
6 years, 2 weeks ago 0|i07kfb:
ZooKeeper ZOOKEEPER-1018

The connection permutation in get_addrs uses a weak and inefficient shuffle

Improvement Closed Minor Fixed Stephen Tyree Stephen Tyree Stephen Tyree 15/Mar/11 08:47   23/Nov/11 14:22 04/Apr/11 17:09 3.3.2 3.4.0 c client   0 0 7200 7200 0% After determining all of the addresses in the get_addrs function in the C client, the connection is permuted using the following code:

setup_random();
/* Permute */
for(i = 0; i < zh->addrs_count; i++) {
struct sockaddr_storage *s1 = zh->addrs + random()%zh->addrs_count;
struct sockaddr_storage *s2 = zh->addrs + random()%zh->addrs_count;
if (s1 != s2) {
struct sockaddr_storage t = *s1;
*s1 = *s2;
*s2 = t;
}
}

Not only does this shuffle produce an uneven permutation, but it is half as efficient as the Fisher-Yates shuffle which produces an unbiased one. It seems like it would be a simple fix to increase the randomness and efficiency of the shuffle by switching over to using Fisher-Yates.
0% 0% 7200 7200 47512 No Perforce job exists for this issue. 1 33345
8 years, 51 weeks, 2 days ago
Reviewed
0|i062jz:
ZooKeeper ZOOKEEPER-1017

Follower.followLeader throws SocketException, then shutdown Follower

Bug Open Major Unresolved Unassigned tom liu tom liu 15/Mar/11 05:59   15/Mar/11 05:59   3.3.3   quorum   0 1   JDK1.6.0_17/CentOS5.5 i use three node to deploy zkcluster. but follower node throws SocketException twice every day.
2011-03-15 14:15:48,260 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@90] - Exception when following the leader
java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126)
at org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:361)
at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:116)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:80)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)

i found the reason is that Follower do not reponse Leader's Ping just on time.
so, i add some logs. finnally, i found that, in org.apache.zookeeper.server.SyncRequestProcessor:
{noformat}
public void processRequest(Request request) {
// request.addRQRec(">sync");
//TODO tom liu added
if(LOG.isDebugEnabled()) {
LOG.debug("Processing request::" + request);
}
queuedRequests.add(request);
//TODO tom liu added
if(LOG.isDebugEnabled()) {
LOG.debug("Processing request::" + request);
}
}
{noformat}

that log is:
2011-03-15 14:15:34,515 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@189] - Processing request::sessionid:0x22e9907b5d50000 type:setData cxid:0x70b55 zxid:0xd50000a73f txntype:5 reqpath:n/a
2011-03-15 14:15:48,259 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@194] - Processing request::sessionid:0x22e9907b5d50000 type:setData cxid:0x70b55 zxid:0xd50000a73f txntype:5 reqpath:n/a

so: elapsed time=13744, LearnerHandler's ia.readRecord TimeOut on run method, then Leader shutdown, and re-elect Leader process.

my question is: why the queuedRequests.add statement take so long time?
2461 No Perforce job exists for this issue. 0 32768
9 years, 2 weeks, 2 days ago 0|i05yzr:
ZooKeeper ZOOKEEPER-1015

DateFormat.getDateTimeInstance() is very expensive, we can cache it to improve performance

Bug Patch Available Major Unresolved Bill Havanki Xiaoming Shi Xiaoming Shi 12/Mar/11 22:00   02/Mar/16 21:44   3.3.2   server   0 2   In the file
{noformat}
./zookeeper-3.3.2/src/java/main/org/apache/zookeeper/server/PurgeTxnLog.java line:103
{noformat}

DateFormat.getDateTimeInstance() is called many times in the for loop. We can cache the result and improve the performance

This is similar to the Apache bug https://issues.apache.org/bugzilla/show_bug.cgi?id=48778

Similar code can be found:
{noformat}
./zookeeper-3.3.2/src/java/main/org/apache/zookeeper/server/TraceFormatter.java
./zookeeper-3.3.2/src/java/main/org/apache/zookeeper/server/LogFormatter.java
{noformat}
newbie 2463 No Perforce job exists for this issue. 1 32769
4 years, 3 weeks ago 0|i05yzz:
ZooKeeper ZOOKEEPER-1014

DateFormat.getDateTimeInstance() is very expensive, we can cache it to improve performance

Bug Resolved Major Duplicate Unassigned Xiaoming Shi Xiaoming Shi 12/Mar/11 12:42   14/Mar/11 23:51 14/Mar/11 23:51 3.3.2   server   0 0   In the file:
{noformat}
./zookeeper-3.3.2/src/java/main/org/apache/zookeeper/server/TraceFormatter.java
{noformat}
DateFormat.getDateTimeInstance() is called in the while loop. We can cache the return value, and improve performance.

This is similar to the Apache Bug https://issues.apache.org/bugzilla/show_bug.cgi?id=48778
214214 No Perforce job exists for this issue. 0 32770
9 years, 2 weeks, 2 days ago 0|i05z07:
ZooKeeper ZOOKEEPER-1013

zkServer.sh usage message should mention all startup options

Bug Closed Trivial Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 11/Mar/11 15:28   23/Nov/11 14:22 15/Mar/11 14:39   3.4.0 server   0 1 300 300 0% currently the "Usage" message for zkServer shows:

echo "Usage: $0 {start|stop|restart|status}"

But it seems to me that it should show the other startup options as well, which are currently: start-foreground, upgrade, print-cmd.

0% 0% 300 300 47513 No Perforce job exists for this issue. 1 32771
9 years, 2 weeks, 1 day ago patch to zkServer.sh to show all startup options 0|i05z0f:
ZooKeeper ZOOKEEPER-1012

support distinct JVMFLAGS for zookeeper server in zkServer.sh and zookeeper client in zkCli.sh

New Feature Closed Trivial Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 11/Mar/11 15:12   26/Jan/12 20:58 16/Mar/11 13:17   3.4.0 server   0 0 300 300 0% 1. Sometimes you might want to run zkServer.sh with different JVMFLAGS than for clients. Make zkServer.sh consult the SERVER_JVMFLAGS variable and, if it exists, add it to the beginning of the existing JVMFLAGS setting.

2. Sometimes you might want to run zkCli.sh with different JVMFLAGS than for servers. Make zkCli.sh consult the CLIENT_JVMFLAGS variable and, if it exists, add it to the beginning of the existing JVMFLAGS setting.
0% 0% 300 300 47514 No Perforce job exists for this issue. 1 33346
9 years, 2 weeks ago
Reviewed
0|i062k7:
ZooKeeper ZOOKEEPER-1011

fix Java Barrier Documentation example's race condition issue and polish up the Barrier Documentation

Bug Open Major Unresolved maoling Semih Salihoglu Semih Salihoglu 09/Mar/11 04:49   20/Jan/19 07:12       documentation   1 6 0 3000   There is a race condition in the Barrier example of the java doc: http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html. It's in the enter() method. Here's the original example:
boolean enter() throws KeeperException, InterruptedException{
zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL_SEQUENTIAL);
while (true) {
synchronized (mutex) {
List<String> list = zk.getChildren(root, true);

if (list.size() < size) {
mutex.wait();
} else {
return true;
}
}
}
}

Here's the race condition scenario:
Let's say there are two machines/nodes: node1 and node2 that will use this code to synchronize over ZK. Let's say the following steps take place:
node1 calls the zk.create method and then reads the number of children, and sees that it's 1 and starts waiting.

node2 calls the zk.create method (doesn't call the zk.getChildren method yet, let's say it's very slow)
node1 is notified that the number of children on the znode changed, it checks that the size is 2 so it leaves the barrier, it does its work and then leaves the barrier, deleting its node.

node2 calls zk.getChildren and because node1 has already left, it sees that the number of children is equal to 1. Since node1 will never enter the barrier again, it will keep waiting.

--- End of scenario ---

Here's Flavio's fix suggestions (copying from the email thread):
...
I see two possible action points out of this discussion:

1- State clearly in the beginning that the example discussed is not correct under the assumption that a process may finish the computation before another has started, and the example is there for illustration purposes;
2- Have another example following the current one that discusses the problem and shows how to fix it. This is an interesting option that illustrates how one could reason about a solution when developing with zookeeper.
...

We'll go with the 2nd option.
100% 100% 3000 0 pull-request-available 2464 No Perforce job exists for this issue. 0 32772
1 year, 17 weeks ago 0|i05z0n:
ZooKeeper ZOOKEEPER-1010

ZOOKEEPER-850 Remove or move ManagedUtil to contrib, because it has direct log4j dependencies

Sub-task Resolved Major Duplicate Unassigned Olaf Krische Olaf Krische 08/Mar/11 16:37   25/Apr/12 19:48 25/Apr/12 19:48 3.3.1   java client   1 2   Please move ManagedUtil out of the way. It has direct dependencies on log4j api. 2465 No Perforce job exists for this issue. 0 33347
7 years, 48 weeks, 1 day ago 0|i062kf:
ZooKeeper ZOOKEEPER-1008

ZK should give more specific error on missing myid

Improvement Open Minor Unresolved Unassigned Eric Sammer Eric Sammer 07/Mar/11 12:48   05/Sep/11 07:50   3.3.2   server   0 1   On startup, ZK should specifically test for and provide an error message if the myid file is missing. Currently, the error message is simply "Invalid config" if myid is missing. 2466 No Perforce job exists for this issue. 0 42078
8 years, 29 weeks, 3 days ago 0|i07kfr:
ZooKeeper ZOOKEEPER-1007

iarchive leak in C client

Bug Closed Minor Fixed Jeremy Stribling Jeremy Stribling Jeremy Stribling 04/Mar/11 16:42   23/Nov/11 14:22 15/Mar/11 16:42 3.3.3 3.4.0 c client   0 1   On line 1957, zookeeper_process() returns without cleaning up the "ia" buffer that was previously allocated. I don't know how often this code path is taken, but I thought it was worth reporting. I will attach a simple patch shortly. 47515 No Perforce job exists for this issue. 2 32773
9 years, 2 weeks, 1 day ago
Reviewed
0|i05z0v:
ZooKeeper ZOOKEEPER-1006

QuorumPeer "Address already in use" -- regression in 3.3.3

Bug Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 03/Mar/11 12:38   23/Nov/11 14:22 27/Jul/11 13:21 3.3.3 3.3.4, 3.4.0 tests   0 1   CnxManagerTest.testWorkerThreads

See attachment, this is the first time I've seen this test fail, and it's failed 2 out of the last three test runs.

Notice (attachment) once this happens the port never becomes available.

{noformat}
2011-03-02 15:53:12,425 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn$Factory@251] - Accepted socket connection from /172.29.6.162:51441
2011-03-02 15:53:12,430 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@639] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2011-03-02 15:53:12,430 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@1435] - Closed socket connection for client /172.29.6.162:51441 (no session established for client)
2011-03-02 15:53:12,430 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@82] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:267)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:66)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
2011-03-02 15:53:12,431 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@165] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
2011-03-02 15:53:12,432 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11241:QuorumPeer@621] - LOOKING
2011-03-02 15:53:12,432 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11241:FastLeaderElection@663] - New election. My id = 0, Proposed zxid = 0
2011-03-02 15:53:12,433 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
2011-03-02 15:53:12,433 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
2011-03-02 15:53:12,433 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
2011-03-02 15:53:12,633 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state)
2011-03-02 15:53:12,633 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11245:QuorumPeer@655] - LEADING
2011-03-02 15:53:12,636 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@54] - TCP NoDelay set to: true
2011-03-02 15:53:12,638 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11245:ZooKeeperServer@151] - Created server with tickTime 1000 minSessionTimeout 2000 maxSessionTimeout 20000 datadir /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2 snapdir /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2
2011-03-02 15:53:12,639 - ERROR [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@133] - Couldn't bind to port 11245
java.net.BindException: Address already in use
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:365)
at java.net.ServerSocket.bind(ServerSocket.java:319)
at java.net.ServerSocket.<init>(ServerSocket.java:185)
at java.net.ServerSocket.<init>(ServerSocket.java:97)
at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:131)
at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:512)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:657)
{noformat}
47516 No Perforce job exists for this issue. 4 32774
8 years, 35 weeks, 1 day ago turns out this is a bug in the test, the supplied patch fixes the problem by using polling rather than straight sleep.
Reviewed
0|i05z13:
ZooKeeper ZOOKEEPER-1005

Zookeeper servers fail to elect a leader succesfully.

Bug Open Major Unresolved Unassigned Alexandre Hardy Alexandre Hardy 01/Mar/11 11:00   05/Feb/20 07:16   3.2.2 3.7.0, 3.5.8 quorum   1 3   zookeeper-3.2.2; debian We were running 3 zookeeper servers, and simulated a failure on one of the servers.

The one zookeeper node follows the other, but has trouble connecting. It looks like the following exception is the cause:
{noformat}
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumPeer] FOLLOWING
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] -- [org.apache.zookeeper.server.ZooKeeperServer] Created server
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Following zookeeper3/192.168.131.11:2888
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Unexpected exception, tries=0
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING java.net.ConnectException: -- Connection refused
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.PlainSocketImpl.socketConnect(Native Method)
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.Socket.connect(Socket.java:546)
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549)
{noformat}
The last exception while connecting was:
{noformat}
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Unexpected exception
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR java.net.ConnectException: -- Connection refused
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.PlainSocketImpl.socketConnect(Native Method)
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310)
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176)
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163)
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384)
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.Socket.connect(Socket.java:546)
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156)
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549)
2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Exception when following the leader
{noformat}

The leader started leading a bit later
{noformat}
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Notification: 0, 94489312534, 25, 2, LOOKING, LOOKING, 0
2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Adding vote
2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888
2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumPeer] LEADING
{noformat}

But at that time the follower had already terminated and started a new election, so the leader failed:
{noformat}
2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.ZooKeeperServer] Created server
2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.persistence.FileSnap] Reading snapshot /var/lib/zookeeper/version-2/snapshot.1600007d16
2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.persistence.FileTxnSnapLog] Snapshotting: 1600007d16
2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888
2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification.
2011-03-01T14:03:11+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification.
2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888
2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302)
2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323)
2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296)
2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification.
2011-03-01T14:03:32+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification.
2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.Leader] Shutdown called
2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO -- at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:371)
2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO -- at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:297)
2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO -- at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:562)
{noformat}

From http://zookeeper.apache.org/doc/r3.2.2/zookeeperStarted.html:
{quote}
The new entry, initLimit is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader
{quote}

Since we have initLimit=10 and tickTime=4000, we should have 40 seconds for a zookeeper server to contact the leader.

However, in the source code src/java/main/org/apache/zookeeper/server/quorum/Follower.java:

{noformat}
152 for (int tries = 0; tries < 5; tries++) {
153 try {
154 //sock = new Socket();
155 //sock.setSoTimeout(self.tickTime * self.initLimit);
156 sock.connect(addr, self.tickTime * self.syncLimit);
157 sock.setTcpNoDelay(nodelay);
158 break;
159 } catch (IOException e) {
160 if (tries == 4) {
161 LOG.error("Unexpected exception",e);
162 throw e;
163 } else {
164 LOG.warn("Unexpected exception, tries="+tries,e);
165 sock = new Socket();
166 sock.setSoTimeout(self.tickTime * self.initLimit);
167 }
168 }
169 Thread.sleep(1000);
170 }
{noformat}

It appears as if we only have 4 seconds to contact the leader. The timeouts are applied to the socket, but do not take into account that the zookeeper leader may not have started its zookeeper service yet.

Is this the expected behaviour? Or is the expected behaviour that followers should always be able to connect to the leader?
2467 No Perforce job exists for this issue. 0 32775
8 years, 41 weeks, 1 day ago 0|i05z1b:
ZooKeeper ZOOKEEPER-1004

TestClient.cc:363: Assertion: equality assertion failed

Bug Open Major Unresolved Unassigned Eugene Joseph Koontz Eugene Joseph Koontz 28/Feb/11 19:14   05/Dec/11 17:57           0 0   Jenkins (Hudson) shows an error when running test-cppunit. I am not able to replicate this error on my own build machine, so I am unable to diagnose. Perhaps someone with access to the Apache Jenkins. Please see attached output from https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/163//console (click on "full" to see the attached output if your browser can handle that much text).

2468 No Perforce job exists for this issue. 0 32776
8 years, 16 weeks, 3 days ago 0|i05z1j:
ZooKeeper ZOOKEEPER-1003

provide a separate client library jar

Wish Resolved Major Duplicate Unassigned Jean-Pierre Koenig Jean-Pierre Koenig 24/Feb/11 02:57   01/Nov/11 06:07 01/Nov/11 06:07         0 3   This feature request applies to ZooKeeper, HBase, Hadoop and maybe other
projects. Currently, to use one of these projects, I need to include one big
jar file as a dependency, that

- contains the complete server code,
- contains much more code then I use
- and most annoyingly depends on many other jars, that are mostly needed for the
server but not for the client library.

Thus when using maven and including any of the mentioned projects, the
dependency graph of my projects grows unnecessarily large.

This is a severe problem for at least two reasons:
- The probability of conflicting dependencies (versions) gets boosted.
- Especially for mapreduce jobs depending on HBase or Zookeeper, the jars sent to the
clusters grow to beyond 20-30MB of unnecessary dependencies.

One could work around the problem with maven dependency exclusions, but this may lead to unpredictable runtime errors (ClassNotFound) since dependency management is not save on compile time only.

I wish we could solve the underlying issue at the root with a client library.
client, dependencies, library, maven 2469 No Perforce job exists for this issue. 0 33348
8 years, 21 weeks, 2 days ago 0|i062kn:
ZooKeeper ZOOKEEPER-1002

The Barrier sample code should create a EPHEMERAL znode instead of EPHEMERAL_SEQUENTIAL znode

Bug Resolved Minor Invalid Ching-Shen Chen Ching-Shen Chen Ching-Shen Chen 22/Feb/11 21:02   23/Apr/14 18:26 23/Apr/14 18:26 3.3.2 3.4.7, 3.5.0 documentation   0 4   Please see the Barrier sample code from ZooKeeper Tutorial(http://zookeeper.apache.org/doc/r3.3.1/zookeeperTutorial.html#sc_barriers), that should enable a group of processes to synchronize the beginning and the end of a computation. documentation 2470 No Perforce job exists for this issue. 1 32777
5 years, 48 weeks, 1 day ago 0|i05z1r:
ZooKeeper ZOOKEEPER-1000

Provide SSL in zookeeper to be able to run cross colos.

Improvement Resolved Major Duplicate Mahadev Konar Mahadev Konar Mahadev Konar 21/Feb/11 21:26   11/Sep/19 16:33 21/May/19 22:20         26 51   This jira is to track SSL for zookeeper. The inter zookeeper server communication and the client to server communication should be over ssl so that zookeeper can be deployed over WAN's. 2471 No Perforce job exists for this issue. 0 42079
43 weeks, 1 day ago 0|i07kfz:
ZooKeeper ZOOKEEPER-999

Create an package integration project

New Feature Closed Major Fixed Eric Yang Eric Yang Eric Yang 21/Feb/11 20:36   23/Nov/11 14:22 29/Aug/11 17:52   3.4.0 build   0 2   Java 6, RHEL/Ubuntu This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. 47517 No Perforce job exists for this issue. 14 33349
8 years, 30 weeks, 2 days ago Create zookeeper rpm and deb packages.
Reviewed
0|i062kv:
ZooKeeper ZOOKEEPER-997

ZkClient ignores command if there are any space in front of it

Improvement Closed Trivial Duplicate Laxman Alex Alex 21/Feb/11 14:05   23/Nov/11 14:22 12/Oct/11 00:43 3.3.2 3.4.0 java client   0 3   CentOS release 5.5 (Final) ZkClient ignores command if there are any space in front of it.

For example: ls /
causes following output (note space in front of ls)

ZooKeeper -server host:port cmd args
connect host:port
get path [watch]
ls path [watch]
...
2472 No Perforce job exists for this issue. 0 33350
8 years, 26 weeks, 1 day ago 0|i062l3:
ZooKeeper ZOOKEEPER-996

ZkClient: stat on non-existing node causes NPE

Bug Resolved Trivial Duplicate Unassigned Alex Alex 21/Feb/11 14:02   27/May/11 12:08 27/May/11 12:08 3.3.2   java client   0 0   CentOS release 5.5 (Final) stat on non-existing node causes NPE. client quit

stat /aa
Exception in thread "main" java.lang.NullPointerException
at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:130)
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270)
214213 No Perforce job exists for this issue. 0 32778
8 years, 43 weeks, 6 days ago 0|i05z1z:
ZooKeeper ZOOKEEPER-995

C Client exposing chroot information

Bug Resolved Major Duplicate Unassigned Andrei Savu Andrei Savu 21/Feb/11 08:22   24/Apr/14 20:33 24/Apr/14 20:33     c client   0 1   $ uname -a
Linux kaizen 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:48 UTC 2011 i686 GNU/Linux

$ java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) Server VM (build 17.1-b03, mixed mode)

$ python -c "import zookeeper;print zookeeper.__version__"
3.4.0

(latest zookeeper from the trunk)
When creating a new node while using a chrooted connection the client function returns the full path (no chroot prefix). I've encountered this while using zkpython and that's why I suppose it's a problem related to the C bindings. It seems like the java client it's not affected by the same issue (only tested using the command line interface). I will also attach a patch with failing test. 2473 No Perforce job exists for this issue. 1 32779
5 years, 48 weeks ago 0|i05z27:
ZooKeeper ZOOKEEPER-994

"eclipse" target in the build script doesnot include libraray required for test classes in the classpath

Bug Closed Minor Fixed MIS MIS MIS 17/Feb/11 14:38   23/Nov/11 14:22 27/Feb/11 02:11 3.3.2 3.4.0 build   0 1 1800 1800 0% Linux box, Eclipse IDE The "eclipse" target in the zoo-keeper build script doesn't include the accessive.jar present in the folder /src/java/libtest in the .classpath file. But the accessive.jar is being referenced from a couple of test classes.
However, the build is successful :)
0% 0% 1800 1800 47518 No Perforce job exists for this issue. 1 32780
9 years, 4 weeks, 2 days ago
Reviewed
0|i05z2f:
ZooKeeper ZOOKEEPER-993

Code improvements

Improvement Closed Minor Fixed MIS MIS MIS 17/Feb/11 13:31   23/Nov/11 14:22 16/Mar/11 11:59 3.3.2, 3.3.3 3.4.0 leaderElection   0 0 1800 1800 0% Linux box, Eclipse IDE, In the file org.apache.zookeeper.server.quorum.FastLeaderElection.java for methods like totalOrderPredicate and termPredicate, which return boolean, the code is as below :

if (condition)
return true;
else
return false;

I feel, it would be better if the condition itself is returned.
i.e., return condition.

The same thing holds good else where if applicable.
0% 0% 1800 1800 47519 No Perforce job exists for this issue. 1 33351
9 years, 2 weeks ago 0|i062lb:
ZooKeeper ZOOKEEPER-992

MT Native Version of Windows C Client

New Feature Closed Major Fixed Dheeraj Agrawal Camille Fournier Camille Fournier 17/Feb/11 11:29   23/Nov/11 14:22 18/Jul/11 20:59   3.4.0 c client   2 4   Windows 32 This is an extention of the work in https://issues.apache.org/jira/browse/ZOOKEEPER-859
47520 No Perforce job exists for this issue. 11 33352
8 years, 28 weeks, 1 day ago 0|i062lj:
ZooKeeper ZOOKEEPER-991

QuoromPeer.OBSERVER_ID

Bug Open Major Unresolved Unassigned Sandeep Maheshwari Sandeep Maheshwari 14/Feb/11 01:45   05/Feb/20 07:17   3.3.2 3.7.0, 3.5.8 quorum   0 0   Windows I don't understand why do we even need this code at first place.

if (remoteSid == QuorumPeer.OBSERVER_ID) {
/*
* Choose identifier at random. We need a value to identify
* the connection.
*/

remoteSid = observerCounter--;
initializeMessageQueue(remoteSid);
LOG.info("Setting arbitrary identifier to observer: " + remoteSid);
}
Even if remove above code from public Long readRemoteServerID(Socket sock) {} function the FLE will work correctly. Because when any other peer(PARTICIPANT) receive a notification from the observer, that peer won't consider his(observer) vote because of this check

if(!self.getVotingView().containsKey(response.sid))

Hence there is no need of that code. Also bcoz to above code there is a possibility of creating redundant threads (SendWorker-ReceiveWorker) bcoz when same participant try to initiate connection with same peer we are doing (sid = observerCounter--;). So the same observer getting different sid and hence corresponding thread would be crated which will be of no use.

Please let me know if i am correct.
2474 No Perforce job exists for this issue. 0 32781
9 years, 6 weeks, 2 days ago 0|i05z2n:
ZooKeeper ZOOKEEPER-990

random session timeout when there is a large number of sessions

Bug Open Major Unresolved Unassigned Xiaowei Jiang Xiaowei Jiang 13/Feb/11 19:20   14/Feb/11 12:04   3.3.2   server   0 1   When there is large number of sessions, random session timeout starts after a few hours. It happens even though the load on the server is small (less than 1 out of 8 process busy and plenty of memory). Increase the timeout to 300 seconds only delays this but the session timeout eventually happens. 2475 No Perforce job exists for this issue. 0 32782
9 years, 6 weeks, 3 days ago 0|i05z2v:
ZooKeeper ZOOKEEPER-989

ZK servers not balanced in number of sessions

Bug Open Minor Unresolved Unassigned Xiaowei Jiang Xiaowei Jiang 13/Feb/11 19:16   19/Mar/11 16:15   3.3.2   c client   0 1   In a 5-machine ZK cluster, when there is a large number of sessions, the 1st server seems to get more sessions.

1st server gets 25% sessions, while the remaining gets 18.75% sessions
2476 No Perforce job exists for this issue. 0 32783
9 years, 1 week, 5 days ago 0|i05z33:
ZooKeeper ZOOKEEPER-988

ZK server hang on leader election

Bug Resolved Major Incomplete Unassigned Xiaowei Jiang Xiaowei Jiang 13/Feb/11 19:13   14/Oct/13 19:52 14/Oct/13 19:52 3.3.2   leaderElection   0 2   org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run thread exited unexpected, so the server hang on leader election.

QuorumPeer:/0.0.0.0:2181:
[1] sun.misc.Unsafe.park (native method)
[2] java.util.concurrent.locks.LockSupport.parkNanos (LockSupport.java:198)
[3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos (AbstractQueuedSynchronizer.java:1,963)
[4] java.util.concurrent.LinkedBlockingQueue.poll (LinkedBlockingQueue.java:395)
[5] org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader (FastLeaderElection.java:677)
[6] org.apache.zookeeper.server.quorum.QuorumPeer.run (QuorumPeer.java:621)
2477 No Perforce job exists for this issue. 0 32784
9 years, 6 weeks, 3 days ago 0|i05z3b:
ZooKeeper ZOOKEEPER-987

Fatal error after reelection

Bug Resolved Major Not A Problem Unassigned Xiaowei Jiang Xiaowei Jiang 13/Feb/11 19:10   14/Feb/11 01:14 14/Feb/11 01:14 3.3.2   server   0 0   ZK server hit fatal error after leader re-election:

2011-01-17 14:38:29,709 - DEBUG [WorkerSender Thread:QuorumCnxManager@384] - There is a connection already for server 4
2011-01-17 14:38:30,111 - DEBUG [WorkerReceiver Thread:FastLeaderElection$Messenger$WorkerReceiver@214] - Receive new notification message. My id = 1
2011-01-17 14:38:30,111 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 8589936845 (n.zxid), 6 (n.round), LOOKING (n.state), 4 (n.sid), FOLLOWING (my state)
2011-01-17 14:38:30,111 - DEBUG [WorkerReceiver Thread:FastLeaderElection$Messenger$WorkerReceiver@288] - Sending new notification. My id = 1, Recipient = 4
2011-01-17 14:38:30,112 - DEBUG [WorkerSender Thread:QuorumCnxManager@384] - There is a connection already for server 4
2011-01-17 14:38:34,115 - INFO [QuorumPeer:/0.0.0.0:2181:Learner@315] - Setting leader epoch 3
2011-01-17 14:38:34,117 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@116] - Got zxid 0x2000008ce expected 0x1
2011-01-17 14:38:34,117 - INFO [QuorumPeer:/0.0.0.0:2181:FileTxnSnapLog@208] - Snapshotting: 300000000
2011-01-17 14:38:37,346 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@116] - Got zxid 0x300000001 expected 0x2000008cf
2011-01-17 14:38:37,988 - FATAL [QuorumPeer:/0.0.0.0:2181:FollowerZooKeeperServer@112] - Committing zxid 0x300000001 but next pending txn 0x2000008ce
214212 No Perforce job exists for this issue. 0 32785
9 years, 6 weeks, 3 days ago 0|i05z3j:
ZooKeeper ZOOKEEPER-986

In QuoromCnxManager we are adding sent messgae to lastMessageSent, but we are never removing that message from it after sending it, so this will lead to sending the same message again in next round

Bug Resolved Minor Not A Problem Unassigned Sandeep Maheshwari Sandeep Maheshwari 11/Feb/11 07:04   19/May/14 17:42 19/May/14 17:42 3.3.2 3.5.0 quorum   0 1   Windows Function for sending out the notification message to corresponding peer for leader election

private void processMessages() throws Exception {
try {
ByteBuffer b = getLastMessageSent(sid);
if (b != null) {
send(b);
}
} catch (IOException e) {
LOG.error("Failed to send last message to " + sid, e);
throw e;
}
try {
ArrayBlockingQueue<ByteBuffer> bq = queueSendMap.get(sid);
if (bq == null) {
dumpQueueSendMap();
throw new Exception("No queue for incoming messages for " +
"sid=" + sid);
}
while (running && !shutdown && sock != null) {
ByteBuffer b = null;
try {
b = bq.poll(1000, TimeUnit.MILLISECONDS);
if(b != null){
recordLastMessageSent(sid, b);
send(b);
}
} catch (InterruptedException e) {
LOG.warn("Interrupted while waiting for message on " +
"queue", e);
}
}
} catch (Exception e) {
LOG.warn("Exception when using channel: for id " + sid
+ " my id = " + self.getId() + " error = ", e);
throw e;
}
}

This is the code taken from zookeeper patch 932.
Here we are adding the message to be sent in current round to lastMessageSent. But in next round that message will still be there. So when we try to send a new message to server it will again do

ByteBuffer b = getLastMessageSent(sid);
if (b != null) {
send(b);
}
and it will again send back that old message to that server. So in this way it will send back every message twice. Though it will not affect the correctness of FLE but sending message twice it create an extra overhead and slow down the election process.
gsoc 36636 No Perforce job exists for this issue. 0 32786
5 years, 44 weeks, 3 days ago 0|i05z3r:
ZooKeeper ZOOKEEPER-985

Test BookieRecoveryTest fails on trunk.

Bug Closed Major Fixed Flavio Paiva Junqueira Mahadev Konar Mahadev Konar 09/Feb/11 14:24   23/Nov/11 14:22 18/Feb/11 12:55   3.3.3, 3.4.0 contrib-bookkeeper   0 1   Darwin moststock-lm 9.7.0 Darwin Kernel Version 9.7.0: Tue Mar 31 22:52:17 PDT 2009; root:xnu-1228.12.14~1/RELEASE_I386 i386 (mac). The unit test fails on trunk on my mac. I think this might be the same on other platforms as well. Ill attach the error logs. 47521 No Perforce job exists for this issue. 3 32787
9 years, 5 weeks, 5 days ago
Reviewed
0|i05z3z:
ZooKeeper ZOOKEEPER-984

jenkins failure in testSessionMoved - NPE in quorum

Bug Resolved Blocker Cannot Reproduce Unassigned Patrick D. Hunt Patrick D. Hunt 07/Feb/11 13:34   28/Feb/19 14:47 24/Dec/13 05:39 3.3.2 3.5.0     0 6   Got the following NPE on my internal jenkins setup running against released 3.3.2 (see attached log)

{noformat}
[junit] 2011-02-06 10:39:56,988 - WARN [QuorumPeer:/0.0.0.0:11365:Follower@116] - Got zxid 0x100000001 expected 0x1
[junit] 2011-02-06 10:39:56,988 - INFO [SyncThread:3:FileTxnLog@197] - Creating new log file: log.100000001
[junit] 2011-02-06 10:39:56,989 - WARN [QuorumPeer:/0.0.0.0:11364:Follower@116] - Got zxid 0x100000001 expected 0x1
[junit] 2011-02-06 10:39:56,989 - INFO [SyncThread:2:FileTxnLog@197] - Creating new log file: log.100000001
[junit] 2011-02-06 10:39:56,990 - WARN [QuorumPeer:/0.0.0.0:11363:Follower@116] - Got zxid 0x100000001 expected 0x1
[junit] 2011-02-06 10:39:56,990 - INFO [SyncThread:5:FileTxnLog@197] - Creating new log file: log.100000001
[junit] 2011-02-06 10:39:56,990 - WARN [QuorumPeer:/0.0.0.0:11366:Follower@116] - Got zxid 0x100000001 expected 0x1
[junit] 2011-02-06 10:39:56,990 - INFO [SyncThread:1:FileTxnLog@197] - Creating new log file: log.100000001
[junit] 2011-02-06 10:39:56,991 - INFO [SyncThread:4:FileTxnLog@197] - Creating new log file: log.100000001
[junit] 2011-02-06 10:39:56,995 - INFO [main-SendThread(localhost.localdomain:11363):ClientCnxn$SendThread@738] - Session establishment complete on server localhost.localdomain/127.0.0.1:11363, sessionid = 0x12dfc45e6dd0000, negotiated timeout = 30000
[junit] 2011-02-06 10:39:56,996 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x12dfc45e6dd0000 with negotiated timeout 30000 for client /127.0.0.1:37810
[junit] 2011-02-06 10:39:56,999 - INFO [main:ZooKeeper@436] - Initiating client connection, connectString=127.0.0.1:11364 sessionTimeout=30000 watcher=org.apache.zookeeper.test.QuorumTest$5@248523a0 sessionId=85001345146093568 sessionPasswd=<hidden>
[junit] 2011-02-06 10:39:57,000 - INFO [main-SendThread():ClientCnxn$SendThread@1041] - Opening socket connection to server /127.0.0.1:11364
[junit] 2011-02-06 10:39:57,000 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11364:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:36682
[junit] 2011-02-06 10:39:57,001 - INFO [main-SendThread(localhost.localdomain:11364):ClientCnxn$SendThread@949] - Socket connection established to localhost.localdomain/127.0.0.1:11364, initiating session
[junit] 2011-02-06 10:39:57,002 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11364:NIOServerCnxn@770] - Client attempting to renew session 0x12dfc45e6dd0000 at /127.0.0.1:36682
[junit] 2011-02-06 10:39:57,002 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11364:Learner@95] - Revalidating client: 85001345146093568
[junit] 2011-02-06 10:39:57,003 - INFO [QuorumPeer:/0.0.0.0:11364:NIOServerCnxn@1580] - Established session 0x12dfc45e6dd0000 with negotiated timeout 30000 for client /127.0.0.1:36682
[junit] 2011-02-06 10:39:57,004 - INFO [main-SendThread(localhost.localdomain:11364):ClientCnxn$SendThread@738] - Session establishment complete on server localhost.localdomain/127.0.0.1:11364, sessionid = 0x12dfc45e6dd0000, negotiated timeout = 30000
[junit] 2011-02-06 10:39:57,005 - WARN [CommitProcessor:2:NIOServerCnxn@1524] - Unexpected exception. Destruction averted.
[junit] java.lang.NullPointerException
[junit] at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
[junit] at org.apache.zookeeper.proto.SetDataResponse.serialize(SetDataResponse.java:40)
[junit] at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
[junit] at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1500)
[junit] at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
[junit] at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
[junit] Running org.apache.zookeeper.test.QuorumTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] Test org.apache.zookeeper.test.QuorumTest FAILED (timeout)
[junit] 2011-02-06 10:53:26,189 - INFO [main:PortAssignment@31] - assigning port 11221
[junit] 2011-02-06 10:53:26,192 - INFO [main:PortAssignment@31] - assigning port 11222
{noformat}
36637 No Perforce job exists for this issue. 1 32788
1 year, 3 weeks ago 0|i05z47:
ZooKeeper ZOOKEEPER-983

running zkServer.sh start remotely using ssh hangs

Bug Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 03/Feb/11 00:21   23/Nov/11 14:22 27/Feb/11 01:57 3.3.2 3.4.0 scripts   0 2   If zkServer.sh is run remotely using ssh as follows ssh will "hang" - i.e. not complete/return once the server is started. This is even though zkServer.sh starts the java vm in the background.

$ ssh <host> "zkServer.sh start"

this is due to the following issue:

http://www.slac.stanford.edu/comp/unix/ssh_faq.html#logoff_hangs

37456 No Perforce job exists for this issue. 1 30000
8 years, 42 weeks ago
Reviewed
0|i05hx3:
ZooKeeper ZOOKEEPER-982

zkServer.sh won't start zookeeper on an ubuntu 10.10 system due to a bug in the startup script.

Bug Resolved Minor Invalid Thomas Koch Bjørn Remseth Bjørn Remseth 02/Feb/11 04:52   12/Dec/11 12:47 12/Dec/11 12:47 3.3.1 3.5.0 scripts   0 2   When running "zkServer.sh start" I get these error messages:

====
$sudo sh bin/zkServer.sh start
MX enabled by default
bin/zkServer.sh: 69: cygpath: not found
Using config:
grep: : No such file or directory
Starting zookeeper ...
STARTED
$ Invalid config, exiting abnormally
====

The "Invalid config..." text is output from the server which terminates immediately after this message has been printed.

The fix is easy: Inside zkServer.sh change the line
====
if $cygwin
====

into

====
if [ -n "$cygwin" ]
====

This fixes the problem and makes the server run


70801 No Perforce job exists for this issue. 1 32789
8 years, 15 weeks, 5 days ago 0|i05z4f:
ZooKeeper ZOOKEEPER-981

Hang in zookeeper_close() in the multi-threaded C client

Bug Closed Critical Fixed Jeremy Stribling Jeremy Stribling Jeremy Stribling 01/Feb/11 15:23   23/Nov/11 14:22 14/Sep/11 00:10 3.3.2 3.4.0 c client   1 7   Debian Squeeze, Linux 2.6.32-5, x86_64 I saw a hang once when my C++ application called the zookeeper_close() method of the multi-threaded Zookeeper client library. The stack trace of the hung thread was the following:

{quote}
Thread 8 (Thread 5644):
#0 0x00007f5d7bb5bbe4 in __lll_lock_wait () from /lib/libpthread.so.0
#1 0x00007f5d7bb59ad0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2 0x00007f5d793628f6 in unlock_completion_list (l=0x32b4d68) at .../zookeeper/src/c/src/mt_adaptor.c:66
#3 0x00007f5d79354d4b in free_completions (zh=0x32b4c80, callCompletion=1, reason=-116) at .../zookeeper/src/c/src/zookeeper.c:1069
#4 0x00007f5d79355008 in cleanup_bufs (zh=0x32b4c80, callCompletion=1, rc=-116) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:1125
#5 0x00007f5d79353200 in destroy (zh=0x32b4c80) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:366
#6 0x00007f5d79358e0e in zookeeper_close (zh=0x32b4c80) at .../zookeeper/src/c/src/zookeeper.c:2326
#7 0x00007f5d79356d18 in api_epilog (zh=0x32b4c80, rc=0) at .../zookeeper/src/c/src/zookeeper.c:1661
#8 0x00007f5d79362f2f in adaptor_finish (zh=0x32b4c80) at .../zookeeper/src/c/src/mt_adaptor.c:205
#9 0x00007f5d79358c8c in zookeeper_close (zh=0x32b4c80) at .../zookeeper/src/c/src/zookeeper.c:2297
...
{quote}

The omitted part of the stack trace is entirely within my application, and contains no other calls to/from the Zookeeper client. In particular, I am not calling zookeeper_close() from within a completion handler or any of the library's threads.

I haven't been able to reproduce this, and when I encountered this I wasn't capturing logging from the client library, so unfortunately I don't have any more information at this time. But I will update this JIRA if I see it again.
47522 No Perforce job exists for this issue. 3 32790
8 years, 18 weeks, 6 days ago
Reviewed
0|i05z4n:
ZooKeeper ZOOKEEPER-980

allow configuration parameters for log4j.properties

Improvement Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 01/Feb/11 03:12   26/Apr/15 14:30 09/Feb/11 18:43   3.4.0     0 1   log4j.properties can contain properties that may be overridden using system properties. Hadoop's bin/hadoop is doing this already, I will be replicating in ZK's config. 37457 No Perforce job exists for this issue. 1 30004
9 years, 7 weeks ago
Reviewed
0|i05hxz:
ZooKeeper ZOOKEEPER-979

UnknownHostException in QuorumCnxManager

Bug Open Minor Unresolved Unassigned Hugh Warrington Hugh Warrington 27/Jan/11 11:44   28/Jan/11 09:10   3.3.2   server   0 3   I'm using zk 3.3.2 and I'm seeing this in my logs around startup:

2011-01-27 10:16:21,513 [WorkerSender Thread] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 0 at election address xxx.yyy.com/10.2.131.19:3888
java.net.UnknownHostException
at sun.nio.ch.Net.translateException(Net.java:100)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:140)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:366)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:335)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
at java.lang.Thread.run(Thread.java:636)

And all subsequent zk ops give {{ConnectionLossException}}.

I've just explained this to breed_zk on IRC, and he asked me to file a ticket, mentioning that UnknownHostException may sometimes be thrown for reasons other than host resolution. While I'm reasonably certain that the hostname is correct and should be contactable, I need to put some more time into checking our network setup to be absolutely sure. However, two observations arose while looking into this:

* At the top of QuorumCnxManager.connectOne(), we set electionAddr (or fail and return). But then a few lines later we don't actually use this local variable in the call to connect(). This seems like a minor programming mistake (although AFAICT it doesn't change the behaviour).
* In the subsequent catch block, the UnknownHostException that's thrown doesn't contain the address that we were trying to connect to (though if you capture WARN log messages, you can see what it was).
36638 No Perforce job exists for this issue. 0 32791
9 years, 8 weeks, 6 days ago 0|i05z4v:
ZooKeeper ZOOKEEPER-978

ZookeeperServer does not close zk database on shutdwon

Bug Resolved Major Duplicate Thomas Koch Sergei Bobovich Sergei Bobovich 20/Jan/11 12:04   17/May/14 22:33 17/May/14 22:33 3.3.2 3.4.6, 3.5.0 server   0 2   ZookeeperServer does not close zk database on shutdown leaving log files open. Not sure if this is an intention, but looks like a possible bug to me. Database is getting closed only from QuorumPeer class.
Hit it when executing regression tests on windows: failed to delete log files from cleanup.
35 No Perforce job exists for this issue. 2 32792
5 years, 44 weeks, 4 days ago 0|i05z53:
ZooKeeper ZOOKEEPER-977

passing null for path_buffer in zoo_create

Improvement Closed Major Fixed Benjamin Reed Benjamin Reed Benjamin Reed 19/Jan/11 14:43   23/Nov/11 14:22 08/Feb/11 22:14   3.4.0     0 0   it is unclear from the comments for zoo_create if a NULL can be passed for path_buffer. 47523 No Perforce job exists for this issue. 1 33353
9 years, 7 weeks, 1 day ago
Reviewed
0|i062lr:
ZooKeeper ZOOKEEPER-976

ZooKeeper startup script doesn't use JAVA_HOME

Bug Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 17/Jan/11 20:36   23/Nov/11 14:22 27/Feb/11 02:02 3.3.2 3.4.0     0 2   From bug filed on CDH: https://issues.cloudera.org/browse/DISTRO-47 - moving it to this jira to address:

------------------------------------------------------
Bug filed by "grep.alex" at http://getsatisfaction.com/cloudera/topics/cdh3b3_zookeeper_startup_script_doesnt_use_java_home

On RedHat 5 (using the RPM installer) I was able to install and run all the Hadoop components. The Zookeeper install was fine, but it wouldn't start:

{noformat}
[root@aholmes-desktop init.d]# ./hadoop-zookeeper start
JMX enabled by default
Using config: /etc/zookeeper/zoo.cfg
Starting zookeeper ...
STARTED
[root@aholmes-desktop init.d]# Exception in thread "main" java.lang.NoSuchMethodError: method java.lang.management.ManagementFactory.getPlatformMBeanServer with signature ()Ljavax.management.MBeanServer; was not found.
at org.apache.zookeeper.jmx.ManagedUtil.registerLog4jMBeans(ManagedUtil.java:48
...
{noformat}

After some digging around I found the cause - the Zookeeper startup script (/usr/lib/zookeeper/bin/zkServer.sh ) uses the java found in the path, whereas the other startup scripts use JAVA_HOME. In my case I had the default RHEL5 1.4 JDK in the path, and the 1.6 JDK RPM's installed under /usr/java, hence the above error, which I'm guessing is a fairly common setup.

In my opinion all the startup scripts should all use the same mechanism to determine where to pick java.
37455 No Perforce job exists for this issue. 2 32793
9 years, 4 weeks, 2 days ago
Reviewed
0|i05z5b:
ZooKeeper ZOOKEEPER-975

new peer goes in LEADING state even if ensemble is online

Bug Closed Major Fixed Vishal Kher Vishal Kher Vishal Kher 14/Jan/11 07:29   23/Nov/11 14:22 29/Apr/11 12:13 3.3.2 3.4.0     0 4   Scenario:
1. 2 of the 3 ZK nodes are online
2. Third node is attempting to join
3. Third node unnecessarily goes in "LEADING" state
4. Then third goes back to LOOKING (no majority of followers) and finally goes to FOLLOWING state.


While going through the logs I noticed that a peer C that is trying to
join an already formed cluster goes in LEADING state. This is because
QuorumCnxManager of A and B sends the entire history of notification
messages to C. C receives the notification messages that were
exchanged between A and B when they were forming the cluster.

In FastLeaderElection.lookForLeader(), due to the following piece of
code, C quits lookForLeader assuming that it is supposed to lead.

740 //If have received from all nodes, then terminate
741 if ((self.getVotingView().size() == recvset.size()) &&
742 (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
743 self.setPeerState((proposedLeader == self.getId()) ?
744 ServerState.LEADING: learningState());
745 leaveInstance();
746 return new Vote(proposedLeader, proposedZxid);
747
748 } else if (termPredicate(recvset,


This can cause:
1. C to unnecessarily go in LEADING state and wait for tickTime * initLimit and then restart the FLE.

2. C waits for 200 ms (finalizeWait) and then considers whatever
notifications it has received to make a decision. C could potentially
decide to follow an old leader, fail to connect to the leader, and
then restart FLE. See code below.

752 if (termPredicate(recvset,
753 new Vote(proposedLeader, proposedZxid,
754 logicalclock))) {
755
756 // Verify if there is any change in the proposed leader
757 while((n = recvqueue.poll(finalizeWait,
758 TimeUnit.MILLISECONDS)) != null){
759 if(totalOrderPredicate(n.leader, n.zxid,
760 proposedLeader, proposedZxid)){
761 recvqueue.put(n);
762 break;
763 }
764 }



In general, this does not affect correctness of FLE since C will
eventually go back to FOLLOWING state (A and B won't vote for
C). However, this delays C from joining the cluster. This can in turn
affect recovery time of an application.


Proposal: A and B should send only the latest notification (most
recent) instead of the entire history. Does this sound reasonable?



47524 No Perforce job exists for this issue. 7 32794
8 years, 47 weeks, 5 days ago 0|i05z5j:
ZooKeeper ZOOKEEPER-974

Configurable listen socket backlog for the client port

Improvement Resolved Minor Fixed Josh Elser Hoonmin Kim Hoonmin Kim 10/Jan/11 03:35   04/Oct/19 10:55 13/Feb/19 07:13 3.3.2 3.6.0 server   0 4 0 10200   We're running ZooKeeper ensemble(3-node configuration) for production use for months.
Days ago, we suffered temporary network? problems that caused many reconnections(about 300) of ephemeral nodes in one ZooKeeper server.

The almost all clients successfully reconnected to the other ZooKeeper servers,
but one client failed to reconnect in time and got a session expired message from the server.
(The problem is that our clients died when they got SessionExpired message.)

There were many listenQ overflows/drops and out resets in a minute just before the problem situation.

---

So we patched ZooKeeper to increase the backlog size for the client port socket to avoid unhappy cases like this.
As ZooKeeper uses default backlog size(50) to bind(), we added "clientPortBacklog" option.

Though the default backlog should be good for common environment,
we believe that configuring the size is also meaningful.

[Note]
On linux, below parameter :

net.core.somaxconn

needs to be larger than above "clientPortBacklog" to correctly configure listen socket backlog
100% 100% 10200 0 pull-request-available 36639 No Perforce job exists for this issue. 3 42081
1 year, 5 weeks, 1 day ago backlog 0|i07kgf:
ZooKeeper ZOOKEEPER-973

bind() could fail on Leader because it does not setReuseAddress on its ServerSocket

Bug Resolved Trivial Fixed Harsh J Vishal Kher Vishal Kher 05/Jan/11 01:47   24/Jan/12 05:59 23/Jan/12 15:35 3.3.2 3.4.3, 3.3.5, 3.5.0 server   0 3   setReuseAddress(true) should be used below.

Leader(QuorumPeer self,LeaderZooKeeperServer zk) throws IOException {
this.self = self;
try {
ss = new ServerSocket(self.getQuorumAddress().getPort());
} catch (BindException e) {
LOG.error("Couldn't bind to port "
+ self.getQuorumAddress().getPort(), e);
throw e;
}
this.zk=zk;
}

36640 No Perforce job exists for this issue. 2 32795
8 years, 9 weeks, 2 days ago
Reviewed
0|i05z5r:
ZooKeeper ZOOKEEPER-972

perl Net::ZooKeeper segfaults when setting a watcher on get_children

Bug Open Major Unresolved Unassigned Robert Powers Robert Powers 03/Jan/11 14:06   05/Feb/20 07:16   3.3.2 3.7.0, 3.5.8 contrib-bindings   0 0   rhel 5.3, perl 5.10, Net::Zookeeper-1.35, zookeeper_c_client-3.3.2 and below. The issue I'm seeing seems strikingly similar to this: https://issues.apache.org/jira/browse/ZOOKEEPER-772

I have one writer process which adds sequenced children nodes to /queue and a separate reader process which sets a children watcher on /queue, waiting for children to be added or deleted. Long story short, every time a child node is added or deleted by the writer, the reader's watcher is supposed to trigger so the reader can check if it's time to get to work or go back to bed. Bad things seem to happen while the reader is waiting on the watcher and the writer adds or deletes a node.

In versions prior to 3.3.2, my code that sets a watcher on the children of a node using the perl binding would either lock up when trying to retrieve the children or would segfault when a child node was added while waiting on the watch. In 3.3.2, it seems to just do the locking up.

I'm seeing this: assertion botched (free()ed/realloc()ed-away memory was overwritten?): !(MallocCfg[MallocCfg_filldead] && MallocCfg[Mall
ocCfg_fillcheck]) || !cmp_pat_4bytes((unsigned char*)(p + 1), (((1 << ((bucket) >> 0)) + ((bucket >= 15 * 1) ? 4096 : 0)) - (siz
eof(union overhead) + sizeof (unsigned int))) + sizeof (unsigned int), fill_deadbeef) (malloc.c:1536)

I managed to get a stack trace

Program received signal SIGABRT, Aborted.
0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xf7b8ed80 in raise () from /lib/libc.so.6
#2 0xf7b90691 in abort () from /lib/libc.so.6
#3 0xf7d6d53f in botch (diag=0xa <Address 0xa out of bounds>,
s=0xf7ef42e8 "!(MallocCfg[MallocCfg_filldead] && MallocCfg[MallocCfg_fillcheck]) || !cmp_pat_4bytes((unsigned char*)(p + 1),
(((1 << ((bucket) >> 0)) + ((bucket >= 15 * 1) ? 4096 : 0)) - (sizeof(union overhead) + s"..., file=0xf7ef4119 "malloc.c", line
=1536) at malloc.c:1327
#4 0xf7d6d97a in Perl_malloc (nbytes=15530) at malloc.c:1535
#5 0xf7d6f974 in Perl_calloc (elements=1, size=0) at malloc.c:2314
#6 0xf7929eca in _zk_create_watch (my_perl=0x0) at ZooKeeper.xs:204
#7 0xf7929f8f in _zk_acquire_watch (my_perl=0x0) at ZooKeeper.xs:240
#8 0xf793450b in XS_Net__ZooKeeper_watch (my_perl=0x889c008, cv=0x89db8b4) at ZooKeeper.xs:2035
#9 0xf7e1dd67 in Perl_pp_entersub (my_perl=0x889c008) at pp_hot.c:2847
#10 0xf7de47ce in Perl_runops_debug (my_perl=0x889c008) at dump.c:1931
#11 0xf7e0d856 in perl_run (my_perl=0x889c008) at perl.c:2384
#12 0x08048ace in main (argc=2, argv=0xffe11814, env=0xffe11820) at perlmain.c:113

The code to reproduce:
sub bide_time
{
my $root = '/queue';
my $timeout = 20*1000;
my $zkc = Net::ZooKeeper->new('localhost:2181');

while (1) {
print "Retrieving $root\n";
my $child_watch = $zkc->watch('timeout' => $timeout);

my @children = $zkc->get_children($root, watch=>$child_watch);
if (scalar(@children)) {
return @children if (rand(1) > 0.75);
} else {
print " - No Children.\n";
}
print "Time to wait for the Children.\n";
if ($child_watch->wait()) {
print "watch triggered on node $root:\n";
print " event: $child_watch->{event}\n";
print " state: $child_watch->{state}\n";
} else {
print "watch timed out\n";
}
}
}
36641 No Perforce job exists for this issue. 0 32796
9 years, 12 weeks, 3 days ago 0|i05z5z:
ZooKeeper ZOOKEEPER-971

Replace Packet class with Operation classes

Improvement Open Minor Unresolved Thomas Koch Thomas Koch Thomas Koch 30/Dec/10 13:20   30/Dec/10 13:20           0 1   The operation classes introduced in ZOOKEEPER-911 can be used to replace the Packet class entirely.
Then it would also be possible to move the code from the ugly big if clause in EventThread.processEvent to the individual operation classes.

This cleanup may help to prepare the code for the move from jute to avro.
214211 No Perforce job exists for this issue. 0 42082
9 years, 13 weeks ago 0|i07kgn:
ZooKeeper ZOOKEEPER-970

ZOOKEEPER-835 Review and refactor Java client close logic

Sub-task Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 30/Dec/10 12:14   30/Dec/10 13:29           0 1   There have been several jira tickets to fix the close logic but there are still possibilities for blocks as discovered in ZOOKEEPER-911.

For example the failing server.InvalidSnapshotTest times out because the ClientCnxn.close() call blocks in Packet.waitForFinish().

However the only change introduced is that instead of

synchronize(packet) while(!packet.finished) packet.wait()

I call packet.waitForFinish() which is a synchronized method.

The bug is in ClientCnxn.queuePacket:
ClientCnxn.closing is set to true before the closeSession Packet is added to outgoingQueue. Between these two steps, the SendThread already terminate so that there's nobody left to call packet.notifyAll().
214210 No Perforce job exists for this issue. 0 42083
9 years, 13 weeks ago 0|i07kgv:
ZooKeeper ZOOKEEPER-969

ZOOKEEPER-835 stat parameter in asynchronous getACL() method is superfluous

Sub-task Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 29/Dec/10 11:26   29/Dec/10 11:26           0 1   214209 No Perforce job exists for this issue. 0 42084
9 years, 13 weeks, 1 day ago 0|i07kh3:
ZooKeeper ZOOKEEPER-968

ZOOKEEPER-965 Database multi-update

Sub-task Closed Major Not A Problem Unassigned Ted Dunning Ted Dunning 29/Dec/10 02:06   23/Nov/11 14:22 16/Jul/11 15:12   3.4.0     0 1   This includes the database operations themselves 67883 No Perforce job exists for this issue. 0 33354
8 years, 36 weeks, 5 days ago 0|i062lz:
ZooKeeper ZOOKEEPER-967

ZOOKEEPER-965 Server side decoding and function dispatch

Sub-task Closed Major Fixed Unassigned Ted Dunning Ted Dunning 29/Dec/10 02:05   23/Nov/11 14:22 02/May/11 13:22   3.4.0     0 0   This would include making the server catch the request and hand it down to the actual transaction code 47525 No Perforce job exists for this issue. 0 33355
8 years, 47 weeks, 3 days ago 0|i062m7:
ZooKeeper ZOOKEEPER-966

ZOOKEEPER-965 Client side for multi

Sub-task Closed Major Fixed Unassigned Ted Dunning Ted Dunning 29/Dec/10 02:04   23/Nov/11 14:22 02/May/11 13:21   3.4.0     0 2   This is jus the client side of the code up to and including the serialization of requests. 47526 No Perforce job exists for this issue. 0 33356
8 years, 47 weeks, 3 days ago 0|i062mf:
ZooKeeper ZOOKEEPER-965

Need a multi-update command to allow multiple znodes to be updated safely

New Feature Closed Major Fixed Ted Dunning Ted Dunning Ted Dunning 27/Dec/10 19:18   23/Nov/11 14:22 30/Jun/11 18:54 3.3.3 3.4.0     0 14   ZOOKEEPER-966, ZOOKEEPER-967, ZOOKEEPER-968 The basic idea is to have a single method called "multi" that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically.

Two API styles have been suggested. One has a list as above and the other style has a "Transaction" that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar.

The total size of all the data in all updates and creates in a single transaction should be limited to 1MB.

Implementation-wise this capability can be done using standard ZK internals. The changes include:

- update to ZK clients to all the new call

- additional wire level request

- on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form.

- on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown.

To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal.
47527 No Perforce job exists for this issue. 21 33357
8 years, 38 weeks, 4 days ago
Reviewed
0|i062mn:
ZooKeeper ZOOKEEPER-964

How to avoid dead nodes generated? These nodes can't be deleted because there parent don't have delete and setacl permission.

Wish Resolved Major Won't Fix Unassigned allengao allengao 26/Dec/10 22:26   15/May/14 16:53 15/May/14 16:53 3.3.2 3.5.0 server   0 4 1209600 1209600 0% i686-suse-linux When a node which do not have setacl and delete permission was created (eg. permits=0x01), its children will never be deleted, even use superDigest。So, how to avoid this situation? 0% 0% 1209600 1209600 36642 No Perforce job exists for this issue. 1 42085
5 years, 45 weeks, 6 days ago dead node 0|i07khb:
ZooKeeper ZOOKEEPER-963

Make Forrest work with JDK6

Bug Closed Major Fixed Carl Steinbach Carl Steinbach Carl Steinbach 23/Dec/10 03:34   23/Nov/11 14:22 28/Dec/10 20:08   3.3.3, 3.4.0 build, documentation   0 1   It's possible to make Forrest work with JDK6 by disabling sitemap validation
in the forrest.properties file. See FOR-984 and PIG-1508 for more details.
47528 No Perforce job exists for this issue. 1 32797
9 years, 13 weeks, 1 day ago
Reviewed
0|i05z67:
ZooKeeper ZOOKEEPER-962

leader/follower coherence issue when follower is receiving a DIFF

Bug Closed Critical Fixed Chia-Hung Lin Camille Fournier Camille Fournier 21/Dec/10 13:42   23/Nov/11 14:22 23/Jan/11 00:31 3.3.2 3.3.3, 3.4.0 server   0 3   From mailing list:
It seems like we rely on the LearnerHandler thread startup to capture all of the missing committed
transactions in the SNAP or DIFF, but I don't see anything (especially in the DIFF case) that
is preventing us for committing more transactions before we actually start forwarding updates
to the new follower.

Let me explain using my example from ZOOKEEPER-919. Assume we have quorum already, so the
leader can be processing transactions while my follower is starting up.

I'm a follower at zxid N-5, the leader is at N. I send my FOLLOWERINFO packet to the leader
with that information. The leader gets the proposals from its committed log (time T1), then
syncs on the proposal list (LearnerHandler line 267. Why? It's a copy of the underlying proposal
list... this might be part of our problem). I check to see if the peerLastZxid is within my
max and min committed log and it is, so I'm going to send a diff. I set the zxidToSend to
be the maxCommittedLog at time T3 (we already know this is sketchy), and forward the proposals
from my copied proposal list starting at the peerLastZxid+1 up to the last proposal transaction
(as seen at time T1).

After I have queued up all those diffs to send, I tell the leader to startFowarding updates
to this follower (line 308).

So, let's say that at time T2 I actually swap out the leader to the thread that is handling
the various request processors, and see that I got enough votes to commit zxid N+1. I commit
N+1 and so my maxCommittedLog at T3 is N+1, but this proposal is not in the list of proposals
that I got back at time T1, so I don't forward this diff to the client. Additionally, I processed
the commit and removed it from my leader's toBeApplied list. So when I call startForwarding
for this new follower, I don't see this transaction as a transaction to be forwarded.

There's one problem. Let's also imagine, however, that I commit N+1 at time T4. The maxCommittedLog
value is consistent with the max of the diff packets I am going to send the follower. But,
I still committed N+1 and removed it from the toBeApplied list before calling startFowarding
with this follower. How does the follower get this transaction? Does it?

To put it another way, here is the thread interaction, hopefully formatted so you can read
it...

LearnerHandlerThread RequestProcessorThread
T1(LH): get list of proposals (COPY)
T2(RPT): commit N+1, remove from toBeApplied
T3(LH): get maxCommittedLog
T4(LH): send diffs from view at T1
T5(LH): startForwarding


Or
T1(LH): get list of proposals (COPY)
T2(LH): get maxCommittedLog
T3(RPT): commit N+1, remove from toBeApplied
T4(LH): send diffs from view at T1
T5(LH): startFowarding


I'm trying to figure out what, if anything, keeps the requests from being committed, removed,
and never seen by the follower before it fully starts up.

47529 No Perforce job exists for this issue. 6 32798
9 years, 9 weeks, 4 days ago 0|i05z6f:
ZooKeeper ZOOKEEPER-961

Watch recovery after disconnection when connection string contains a prefix

Bug Closed Critical Fixed Matthias Spycher pmpm47 pmpm47 21/Dec/10 11:07   23/Nov/11 14:22 14/Sep/11 01:51 3.3.1 3.3.4, 3.4.0 java client   0 3   Windows 32 bits Let's say you're using connection string "127.0.0.1:2182/foo".
1) put a childrenchanged watch on relative / (that is, on absolute path /foo)
2) stop the zk server
3) start the zk server
4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the *absolute* path /
- if some other client adds or removes a node under /foo, nothing will happen
- if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error)
34438 No Perforce job exists for this issue. 5 32799
8 years, 27 weeks, 2 days ago
Reviewed
disconnected watch 0|i05z6n:
ZooKeeper ZOOKEEPER-958

Flag to turn off autoconsume in hedwig c++ client

Bug Closed Major Fixed Ivan Kelly Ivan Kelly Ivan Kelly 15/Dec/10 04:18   23/Nov/11 14:22 21/Dec/10 14:34 3.4.0 3.4.0 contrib-hedwig   0 1   Currently the hedwig cpp client will automatically send a consume message to the server when the calling client indicated that it has received the message. If the client wants to queue the messages and not acknowledge them to the server immediately, they need to block, which means interfering with any other running callbacks. 47530 No Perforce job exists for this issue. 1 32800
9 years, 14 weeks ago
Reviewed
0|i05z6v:
ZooKeeper ZOOKEEPER-957

zkCleanup.sh doesn't do anything

Bug Closed Major Fixed Ted Dunning Ted Dunning Ted Dunning 13/Dec/10 12:09   23/Nov/11 14:21 14/Dec/10 22:17 3.3.2 3.3.3, 3.4.0     0 1   Somebody left some echo statements in the zkCleanup.sh which prevents the java commands from actually running.

Patch coming forthwith.
47531 No Perforce job exists for this issue. 1 32801
9 years, 15 weeks, 1 day ago
Reviewed
0|i05z73:
ZooKeeper ZOOKEEPER-955

Use Atomic(Integer|Long) for (Z)Xid

Improvement Resolved Trivial Won't Fix Thomas Koch Thomas Koch Thomas Koch 07/Dec/10 05:41   16/May/14 18:34 16/May/14 18:34   3.5.0 java client, server   0 2   As I've read last weekend in the fantastic book "Clean Code", it'd be much faster to use AtomicInteger or AtomicLong instead of synchronization blocks around each access to an int or long.
The key difference is, that a synchronization block will in any case acquire and release a lock. The atomic classes use "optimistic locking", a CPU operation that only changes a value if it still has not changed since the last read.
In most cases the value has not changed since the last visit so the operation is just as fast as a normal operation. If it had changed, then we read again and try to change again.

[1] Clean Code: A Handbook of Agile Software Craftsmanship (Robert C. Martin)
71224 No Perforce job exists for this issue. 1 42086
5 years, 45 weeks, 6 days ago
Reviewed
Atomic 0|i07khj:
ZooKeeper ZOOKEEPER-954

Findbugs/ClientCnxn: Bug type JLM_JSR166_UTILCONCURRENT_MONITORENTER

Bug Patch Available Minor Unresolved Hiroshi Ikeda Thomas Koch Thomas Koch 29/Nov/10 04:21   02/Mar/16 20:47       java client   0 2   JLM Synchronization performed on java.util.concurrent.LinkedBlockingQueue in org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn$Packet)

Bug type JLM_JSR166_UTILCONCURRENT_MONITORENTER (click for details)
In class org.apache.zookeeper.ClientCnxn$EventThread
In method org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn$Packet)
Type java.util.concurrent.LinkedBlockingQueue
Value loaded from field org.apache.zookeeper.ClientCnxn$EventThread.waitingEvents
At ClientCnxn.java:[line 411]
JLM Synchronization performed on java.util.concurrent.LinkedBlockingQueue in org.apache.zookeeper.ClientCnxn$EventThread.run()

Bug type JLM_JSR166_UTILCONCURRENT_MONITORENTER (click for details)
In class org.apache.zookeeper.ClientCnxn$EventThread
In method org.apache.zookeeper.ClientCnxn$EventThread.run()
Type java.util.concurrent.LinkedBlockingQueue
Value loaded from field org.apache.zookeeper.ClientCnxn$EventThread.waitingEvents
At ClientCnxn.java:[line 436]

The respective code:

409 public void queuePacket(Packet packet) {
410 if (wasKilled) {
411 synchronized (waitingEvents) {
412 if (isRunning) waitingEvents.add(packet);
413 else processEvent(packet);
414 }
415 } else {
416 waitingEvents.add(packet);
417 }
418 }
419
420 public void queueEventOfDeath() {
421 waitingEvents.add(eventOfDeath);
422 }
423
424 @Override
425 public void run() {
426 try {
427 isRunning = true;
428 while (true) {
429 Object event = waitingEvents.take();
430 if (event == eventOfDeath) {
431 wasKilled = true;
432 } else {
433 processEvent(event);
434 }
435 if (wasKilled)
436 synchronized (waitingEvents) {
437 if (waitingEvents.isEmpty()) {
438 isRunning = false;
439 break;
440 }
441 }
442 }
36643 No Perforce job exists for this issue. 2 32802
4 years, 5 weeks, 3 days ago 0|i05z7b:
ZooKeeper ZOOKEEPER-953

ZOOKEEPER-940 review project branding requirements, report to board

Sub-task Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 21:04   07/Feb/11 13:30 07/Feb/11 13:30         0 0   47532 No Perforce job exists for this issue. 0 33358
9 years, 7 weeks, 3 days ago 0|i062mv:
ZooKeeper ZOOKEEPER-952

ZOOKEEPER-940 scrub codebase for references to pre-TLP locations.

Sub-task Resolved Major Not A Problem Mahadev Konar Patrick D. Hunt Patrick D. Hunt 24/Nov/10 18:06   08/Oct/13 17:55 08/Oct/13 17:55         0 0   The codebase needs to be scrubbed of references to hadoop and old locations (web site, wiki, svn, mailing lists, etc...)
214208 No Perforce job exists for this issue. 0 42087
9 years, 16 weeks, 6 days ago 0|i07khr:
ZooKeeper ZOOKEEPER-951

ZOOKEEPER-940 monthly board reports for first 3 months (then quarterly reports)

Sub-task Resolved Major Implemented Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 18:01   08/Oct/13 17:55 08/Oct/13 17:55         0 0   Board reporting guidelines can be found here:
http://apache.org/foundation/board/reporting
note that ZOOKEEPER-953 should also be addressed (branding checklist)
214207 No Perforce job exists for this issue. 0 42088
9 years, 18 weeks, 1 day ago 0|i07khz:
ZooKeeper ZOOKEEPER-950

ZOOKEEPER-940 create bylaws

Sub-task Resolved Major Fixed Unassigned Patrick D. Hunt Patrick D. Hunt 24/Nov/10 17:59   07/Feb/11 13:40 07/Feb/11 13:40         0 0   47533 No Perforce job exists for this issue. 0 33359
9 years, 7 weeks, 3 days ago 0|i062n3:
ZooKeeper ZOOKEEPER-949

ZOOKEEPER-940 work with infra to move the git mirror

Sub-task Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 17:37   30/Nov/10 13:14 30/Nov/10 13:14         0 0   47534 No Perforce job exists for this issue. 0 33360
9 years, 17 weeks, 2 days ago 0|i062nb:
ZooKeeper ZOOKEEPER-948

ZOOKEEPER-940 send mail to the zk mailing lists about the list name changes

Sub-task Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 13:16   24/Nov/10 15:58 24/Nov/10 15:58         0 0   47535 No Perforce job exists for this issue. 0 33361
9 years, 18 weeks, 1 day ago 0|i062nj:
ZooKeeper ZOOKEEPER-947

ZOOKEEPER-940 move the wiki content to it's new home

Sub-task Resolved Major Fixed Benjamin Reed Patrick D. Hunt Patrick D. Hunt 24/Nov/10 12:50   07/Feb/11 00:31 07/Feb/11 00:31         0 0   47536 No Perforce job exists for this issue. 0 33362
9 years, 7 weeks, 3 days ago 0|i062nr:
ZooKeeper ZOOKEEPER-946

ZOOKEEPER-940 update howtorelease page with new details (svn, filepaths, notifications and such)

Sub-task Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 12:49   11/Mar/11 01:25 11/Mar/11 01:25         0 0   47537 No Perforce job exists for this issue. 0 33363
9 years, 2 weeks, 6 days ago 0|i062nz:
ZooKeeper ZOOKEEPER-945

ZOOKEEPER-940 update legacy website with new mailing list details

Sub-task Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 12:47   24/Nov/10 17:58 24/Nov/10 17:58         0 0   47538 No Perforce job exists for this issue. 0 33364
9 years, 18 weeks, 1 day ago 0|i062o7:
ZooKeeper ZOOKEEPER-944

ZOOKEEPER-940 perform a svn move to move the ZK codebase out from under hadoop

Sub-task Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 12:47   24/Nov/10 17:19 24/Nov/10 17:17         0 0   47539 No Perforce job exists for this issue. 0 33365
9 years, 18 weeks, 1 day ago 0|i062of:
ZooKeeper ZOOKEEPER-943

ZOOKEEPER-940 address hudson configuration change for svn move

Sub-task Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 12:46   24/Nov/10 17:58 24/Nov/10 17:14         0 0   47540 No Perforce job exists for this issue. 0 33366
9 years, 18 weeks, 1 day ago 0|i062on:
ZooKeeper ZOOKEEPER-942

ZOOKEEPER-940 address hudson configuration change for mailing list

Sub-task Resolved Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 12:46   24/Nov/10 15:57 24/Nov/10 15:57         0 0   47541 No Perforce job exists for this issue. 0 33367
9 years, 18 weeks, 1 day ago 0|i062ov:
ZooKeeper ZOOKEEPER-941

ZOOKEEPER-940 setup the new website on zookeeper.apache.org

Sub-task Resolved Major Fixed Benjamin Reed Patrick D. Hunt Patrick D. Hunt 24/Nov/10 12:45   07/Feb/11 13:40 07/Feb/11 13:40         0 0   This uses the new CMS system. See INFRA-3228 for details. 47542 No Perforce job exists for this issue. 0 33368
9 years, 16 weeks, 6 days ago 0|i062p3:
ZooKeeper ZOOKEEPER-940

Umbrella JIRA for move to TLP

Task Resolved Major Implemented Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 24/Nov/10 12:42   08/Oct/13 17:55 08/Oct/13 17:55         0 0   ZOOKEEPER-941, ZOOKEEPER-942, ZOOKEEPER-943, ZOOKEEPER-944, ZOOKEEPER-945, ZOOKEEPER-946, ZOOKEEPER-947, ZOOKEEPER-948, ZOOKEEPER-949, ZOOKEEPER-950, ZOOKEEPER-951, ZOOKEEPER-952, ZOOKEEPER-953 This is an umbrella jira for our move to TLP status.

Please create subtasks for any issues you find related to the move. Note that INFRA-3228 is now closed, so a number of infra related issues have already been closed. This jira (subs) is for additional issues we need to address.
214206 No Perforce job exists for this issue. 0 42089
9 years, 18 weeks, 1 day ago 0|i07ki7:
ZooKeeper ZOOKEEPER-939

the threads number of a zookeeper is increased all the time

Bug Resolved Major Duplicate Unassigned Qian Ye Qian Ye 24/Nov/10 02:53   05/Sep/11 23:11 05/Sep/11 23:10 3.3.0   server   0 0   Linux 2.6.9-52bs #2 SMP Fri Jan 26 13:34:38 CST 2007 x86_64 x86_64 x86_64 GNU/Linux I have a group of zookeeper servers, there are three servers in this group.
server.0=10.81.4.11:2888:3888
server.1=10.23.240.93:2888:3888
server.2=10.23.244.224:2888:3888

At first, the cluster ran well. About several days ago, I shut down the zookeeper process on one of servers(server.2)., and today, I find that the other two servers run in wired status(the network is fine). The zookeeper process take pretty much resource on the two servers:

on server.1 (it's the leader)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26836 work 18 0 12.8g 803m 8724 S 3.7 10.1 195:56.56 java

$ ll /proc/26836/fd/ | wc -l
3586

[work@tc-test-aos03.tc.baidu.com conf]$ ll /proc/26836/task/ | wc -l
10510

some warning log:
2010-11-24 15:37:48,705 - WARN [Thread-37409:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:39:48,626 - WARN [Thread-37414:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
2010-11-24 15:39:48,656 - WARN [Thread-37413:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:39:48,657 - WARN [Thread-37413:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:41:48,614 - WARN [Thread-37417:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:41:48,643 - WARN [Thread-37418:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
2010-11-24 15:41:48,662 - WARN [Thread-37417:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:43:48,627 - WARN [Thread-37421:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:43:48,627 - WARN [Thread-37422:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
2010-11-24 15:43:48,654 - WARN [Thread-37421:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:44:48,622 - WARN [Thread-37424:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
2010-11-24 15:44:48,652 - WARN [Thread-37423:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:44:48,653 - WARN [Thread-37423:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:45:48,668 - WARN [Thread-37426:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.io.IOException: Channel eof
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
2010-11-24 15:46:48,647 - WARN [Thread-37427:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:46:48,722 - WARN [Thread-37428:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
2010-11-24 15:46:48,736 - WARN [Thread-37427:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:47:48,687 - WARN [Thread-37430:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.io.IOException: Channel eof
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)


on server.0
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27322 work 19 0 15.2g 943m 9140 S 38.6 11.8 1396:51 java

$ ll /proc/27322/fd/ | wc -l
3587

$ ll /proc/27322/task/ | wc -l
12938

2010-11-24 15:37:49,269 - WARN [Thread-37407:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:39:49,235 - WARN [Thread-37412:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.io.IOException: Channel eof
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
2010-11-24 15:39:49,410 - WARN [Thread-37411:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:39:49,411 - WARN [Thread-37411:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:41:49,314 - WARN [Thread-37416:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.io.IOException: Channel eof
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
2010-11-24 15:41:49,383 - WARN [Thread-37415:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:41:49,405 - WARN [Thread-37415:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:43:49,372 - WARN [Thread-37420:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.io.IOException: Channel eof
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
2010-11-24 15:43:49,512 - WARN [Thread-37419:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:43:49,513 - WARN [Thread-37419:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:44:49,407 - WARN [Thread-37422:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.io.IOException: Channel eof
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
2010-11-24 15:45:49,645 - WARN [Thread-37424:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
2010-11-24 15:45:49,781 - WARN [Thread-37423:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:45:49,799 - WARN [Thread-37423:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:46:49,495 - WARN [Thread-37427:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.io.IOException: Channel eof
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)
2010-11-24 15:47:49,541 - WARN [Thread-37429:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.nio.channels.AsynchronousCloseException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629)
2010-11-24 15:47:49,622 - WARN [Thread-37428:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570)
2010-11-24 15:47:49,622 - WARN [Thread-37428:QuorumCnxManager$SendWorker@589] - Send worker leaving thread
2010-11-24 15:48:48,827 - WARN [Thread-37431:QuorumCnxManager$RecvWorker@658] - Connection broken:
java.io.IOException: Channel eof
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630)


What's more, the number of threads under the zookeeper process is still increasing time by time. It seems that , something is wrong in communication of the two servers. Have anyone met such problem before?
62364 No Perforce job exists for this issue. 0 32803
8 years, 29 weeks, 2 days ago server 0|i05z7j:
ZooKeeper ZOOKEEPER-938

Support Kerberos authentication of clients.

New Feature Closed Major Fixed Eugene Joseph Koontz Eugene Joseph Koontz Eugene Joseph Koontz 23/Nov/10 12:25   28/Apr/14 00:03 18/Aug/11 18:05   3.4.0 java client, server   0 15   Support Kerberos authentication of clients.

The following usage would let an admin use Kerberos authentication to assign ACLs to authenticated clients.

1. Admin logs into zookeeper (not necessarily through Kerberos however).

2. Admin decides that a new node called '/mynode' should be owned by the user 'zkclient' and have full permissions on this.

3. Admin does: zk> create /mynode content sasl:zkclient@FOOFERS.ORG:cdrwa

4. User 'zkclient' logins to kerberos using the command line utility 'kinit'.

5. User connects to zookeeper server using a Kerberos-enabled version of zkClient (ZookeeperMain).

6. Behind the scenes, the client and server exchange authentication information. User is now authenticated as 'zkclient'.

7. User accesses /mynode with permissions 'cdrwa'.
47543 No Perforce job exists for this issue. 17 33369
8 years, 31 weeks, 6 days ago ZOOKEEPER-938 : support Kerberos authentication via SASL.
Reviewed
0|i062pb:
ZooKeeper ZOOKEEPER-937

test -e not available on solaris /bin/sh

Bug Closed Major Fixed Erik Hetzner Erik Hetzner Erik Hetzner 19/Nov/10 17:12   23/Nov/11 14:21 07/Dec/10 14:02 3.3.0, 3.3.1, 3.3.2 3.4.0 scripts   0 1   SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris
test -e FILENAME is not support on /bin/sh in solaris. This is used in bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. 47544 No Perforce job exists for this issue. 2 32804
9 years, 16 weeks ago
Reviewed
0|i05z7r:
ZooKeeper ZOOKEEPER-936

zkpython is leaking ACL_vector

Bug Open Major Unresolved Unassigned Gustavo Niemeyer Gustavo Niemeyer 18/Nov/10 11:02   14/Dec/19 06:08     3.7.0 contrib-bindings   0 3   It looks like there are no calls to deallocate_ACL_vector() within zookeeper.c in the zkpython binding, which means that (at least) the result of zoo_get_acl() must be leaking. 36644 No Perforce job exists for this issue. 0 32805
8 years, 41 weeks, 1 day ago 0|i05z7z:
ZooKeeper ZOOKEEPER-935

Concurrent primitives library - shared lock

Improvement Open Minor Unresolved Chia-Hung Lin Chia-Hung Lin Chia-Hung Lin 18/Nov/10 04:33   05/Feb/20 07:15     3.7.0, 3.5.8 recipes   0 3   Debian squeeze
JDK 1.6.x
zookeeper trunk
I create this jira to add sharedock function. The function follows recipes at http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#Shared+Locks

36 No Perforce job exists for this issue. 1 42090
8 years, 16 weeks ago zookeeper, shared lock, recipes, lock 0|i07kif:
ZooKeeper ZOOKEEPER-934

ZOOKEEPER-900 Add sanity check for server ID

Sub-task Open Major Unresolved Unassigned Vishal Kher Vishal Kher 17/Nov/10 11:14   05/Feb/20 07:16     3.7.0, 3.5.8     0 0   2. Should I add a check to reject connections from peers that are not
listed in the configuration file? Currently, we are not doing any
sanity check for server IDs. I think this might fix ZOOKEEPER-851.
The fix is simple. However, I am not sure if anyone in community
is relying on this ability.
36645 No Perforce job exists for this issue. 0 42091
9 years, 18 weeks, 6 days ago 0|i07kin:
ZooKeeper ZOOKEEPER-933

ZOOKEEPER-900 Remove wildcard QuorumPeer.OBSERVER_ID

Sub-task Open Major Unresolved Unassigned Vishal Kher Vishal Kher 17/Nov/10 11:11   05/Feb/20 07:16     3.7.0, 3.5.8     0 1   1. I have a question about the following piece of code in QCM:

if (remoteSid == QuorumPeer.OBSERVER_ID) {
/* * Choose identifier at random. We need a value to identify * the connection. */
remoteSid = observerCounter--;
LOG.info("Setting arbitrary identifier to observer: " + remoteSid);
}

Should we allow this? The problem with this code is that if a peer
connects twice with QuorumPeer.OBSERVER_ID, we will end up creating
threads for this peer twice. This could result in redundant
SendWorker/RecvWorker threads.

I haven't used observers yet. The documentation
http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html
says that just like followers, observers should have server IDs. In
which case, why do we want to provide a wild-card?
36646 No Perforce job exists for this issue. 0 42092
9 years, 18 weeks, 6 days ago 0|i07kiv:
ZooKeeper ZOOKEEPER-932

ZOOKEEPER-900 Move blocking read/write calls to SendWorker and RecvWorker Threads

Sub-task Open Major Unresolved Vishal Kher Vishal Kher Vishal Kher 17/Nov/10 11:08   05/Feb/20 07:16   3.3.2 3.7.0, 3.5.8 leaderElection   0 1   Copying relevant comments:

Vishal K added a comment - 02/Nov/10 02:09 PM
Hi Flavio,

I have a suggestion for changing the blocking IO code in QuorumCnxManager. It keeps the current code structure and requires a small amount of changes. I am not sure if these comments should go in ZOOKEEPER-901. ZOOKEEPER-901 is probably addressing netty as well. Please feel free to close this JIRA if you intend to make all the changes as a part of ZOOKEEPER-901.

Basically we jusy need to move parts of initiateConnection and receiveConnection to SenderWorker and ReceiveWorker.

A. Current flow for receiving connection:
1. accept connection in Listener.run()
2. receiveConnection()

* Read remote server's ID
* Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID).
* kill current set of SenderWorker and ReciveWorker threads
* Start a new pair

B Current flow for initiating connection:
1. In connectOne(), connect if not already connected. else return.
2. send my ID to the remote server
3. if my ID < remote server disconnect and return
4. if my ID > remote server

* kill current set of SenderWorker and ReceiveWorkter threads for the remote server
* Start a new pair

Proposed changes:
Move the code that performs any blocking IO in SenderWorker and ReceiveWorker.

A. Proposed flow for receiving connection:
1. accept connection in Listener.run()
2. receiveConnection()

* kill current set of SenderWorker and ReciveWorker threads
* Start a new pair

Proposed changed to SenderWorker:

* Read remote server's ID
* Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID).
* Proceed to normal operation

B Proposed flow for initiating connection:
1. in connectOne(), return if already connected
2. Start a new SenderWorker and ReceiveWorker pair
2. In SenderWorker

* connect to remote server
* write my ID
* if my ID < remote server disconnect and return (shutdown the pair).
* Proceed to normal operation

Questions:

* In QuorumCnxManager, is it necessary to kill the current pair and restart a new one every time we receive a connect request?
* In receiveConnection we may choose to reject an accepted connection if a thread in
SenderWorker is in the process of connecting. Otherwise a server with ID <
remote server may keep sending frequent connect request that will result in the
remote server closing connections for this peer. But I think we add a delay
before sending notifications, which might be good enough to prevent this
problem.

Let me know what you think about this. I can also help with the implementation.

Flavio Junqueira added a comment - 03/Nov/10 05:28 PM
Hi Vishal, I like your proposal, it seems reasonable and not difficult to implement.

On your questions:

1. I don't think it is necessary to kill a pair SenderWorker/RecvWorker every time, and I'd certainly support changing it;
2. I'm not sure where you're suggesting to introduce a delay. In the FLE code, a server sends a new batch of notifications if it changes its vote or if it times out waiting for a new notification. This timeout value increases over time. I was actually thinking that we should reset the timeout value upon receiving a notification. I think this is a bug....

Given that it is your proposal, I'd be happy to let you take a stab at it and help you out if you need a hand. Does it make sense for you?
66890 No Perforce job exists for this issue. 5 42093
8 years, 35 weeks, 2 days ago 0|i07kj3:
ZooKeeper ZOOKEEPER-930

Hedwig c++ client uses a non thread safe logging library

Bug Resolved Major Fixed Ivan Kelly Ivan Kelly Ivan Kelly 15/Nov/10 04:55   17/Nov/10 05:55 16/Nov/10 13:28 3.3.2   contrib-hedwig   0 1   47545 No Perforce job exists for this issue. 2 32806
9 years, 19 weeks, 1 day ago
Reviewed
0|i05z87:
ZooKeeper ZOOKEEPER-929

hudson qabot incorrectly reporting issues as number 909 when the patch from 908 is the one being tested

Bug Resolved Major Cannot Reproduce Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 11/Nov/10 12:03   08/Oct/13 17:56 08/Oct/13 17:56     build   0 0   Hi Nigel can you take a look at this?

Following you'll see the email I got, notice that the patch is patch 908, however if you look at the hudson page it's linked to the change is documented as 909 patch file applied
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25/changes

I looked at both jiras ZOOKEEPER-908 and ZOOKEEPER-909 both of these look good (the right names on patches) and qabot actually updated 908 with the comment (failure). However the "change" is listed as 909 which is wrong.


[exec] -1 overall. Here are the results of testing the latest attachment
[exec] http://issues.apache.org/jira/secure/attachment/12459361/ZOOKEEPER-908.patch
[exec] against trunk revision 1033770.
[exec]
[exec] +1 @author. The patch does not contain any @author tags.
[exec]
[exec] -1 tests included. The patch doesn't appear to include any new or modified tests.
[exec] Please justify why no new tests are needed for this patch.
[exec] Also please list what manual steps were performed to verify this patch.
[exec]
[exec] +1 javadoc. The javadoc tool did not generate any warning messages.
[exec]
[exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
[exec]
[exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
[exec]
[exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.
[exec]
[exec] +1 core tests. The patch passed core unit tests.
[exec]
[exec] +1 contrib tests. The patch passed contrib unit tests.
[exec]
[exec] Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//testReport/
[exec] Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
[exec] Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//console
[exec]
[exec] This message is automatically generated.
[exec]
214205 No Perforce job exists for this issue. 0 32807
9 years, 15 weeks, 4 days ago 0|i05z8f:
ZooKeeper ZOOKEEPER-928

Follower should stop following and start FLE if it does not receive pings from the leader

Bug Resolved Critical Won't Fix Unassigned Vishal Kher Vishal Kher 10/Nov/10 15:06   11/Nov/10 12:07 10/Nov/10 16:40 3.3.2   quorum, server   0 2   In Follower.followLeader() after syncing with the leader, the follower does:
while (self.isRunning()) {
readPacket(qp);
processPacket(qp);
}

It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader.

We should keep track of pings received from the leader and see if we havent seen
a ping packet from the leader for (syncLimit * tickTime) time and give up following the
leader.
214204 No Perforce job exists for this issue. 0 32808
9 years, 20 weeks ago 0|i05z8n:
ZooKeeper ZOOKEEPER-927

there are currently 24 RAT warnings in the build -- address directly or via exclusions

Improvement Resolved Minor Fixed Michi Mutsuzaki Patrick D. Hunt Patrick D. Hunt 09/Nov/10 14:00   19/Jul/14 07:24 19/Jul/14 00:40   3.5.0 build   0 3   We should either fix these, or add exclusions to build.xml.

afaik the current warnings are not real errors/problems, but we should address this directly. (I eyeball it before every release)
36647 No Perforce job exists for this issue. 3 42094
5 years, 35 weeks, 5 days ago
Reviewed
0|i07kjb:
ZooKeeper ZOOKEEPER-926

Fork Hadoop common's test-patch.sh and modify for Zookeeper

Improvement Closed Major Fixed Nigel Daley Nigel Daley Nigel Daley 09/Nov/10 02:43   10/Dec/15 21:54 10/Nov/10 01:23   3.4.0 build   0 0   Zookeeper currently uses the test-patch.sh script from the Hadoop nightly dir. This is now out of date. I propose we just copy the updated one in Hadoop common and then modify for ZK. This will also help as ZK moves out of Hadoop to it's own TLP. 47546 No Perforce job exists for this issue. 1 33370
9 years, 18 weeks, 5 days ago 0|i062pj:
ZooKeeper ZOOKEEPER-925

Consider maven site generation to replace our forrest site and documentation generation

Task Closed Major Fixed Tamas Penzes Patrick D. Hunt Patrick D. Hunt 08/Nov/10 20:53   02/Apr/19 06:40 07/Dec/18 06:34 3.5.4, 3.6.0, 3.4.13 3.6.0, 3.5.5, 3.4.14 documentation   0 7   ZOOKEEPER-3153, ZOOKEEPER-3154, ZOOKEEPER-3155, ZOOKEEPER-3184 See WHIRR-19 for some background.

In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far):
http://incubator.apache.org/whirr/

In particular take a look at the quick start:
http://incubator.apache.org/whirr/quick-start-guide.html
which was generated from
http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
notice this was standard wiki markup (confluence wiki markup, same as available from apache)

You can read more about mvn site plugin here:
http://maven.apache.org/guides/mini/guide-site.html
Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like)


Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period.
100% 69000 0 214203 No Perforce job exists for this issue. 3 42095
1 year, 14 weeks, 6 days ago 0|i07kjj:
ZooKeeper ZOOKEEPER-924

Recipe: Fault tolerant communication layer using Zookeeper

Task Open Major Unresolved Unassigned kishore gopalakrishna kishore gopalakrishna 08/Nov/10 15:18   08/Nov/10 15:20       recipes   0 0   Any This recipe caters to the following use case
There are S(Active) + s(standby) sender nodes and R(Active) + r(standby) receiver nodes. The objective is following

* If one of the S Active server goes down a standby node should take up the task.
* If one of the R Active server goes down a standby node should take up the task.
* When there is a change in receiver the Sender must get updated and send the message to correct destination

This also uses recipe described in https://issues.apache.org/jira/browse/ZOOKEEPER-923

This was developed for a different project S4 which is also open sourced http://s4.io/. The communication layer and task management layer is completely independent of S4 and can be used in any application.



36648 No Perforce job exists for this issue. 0 42096
9 years, 20 weeks, 3 days ago 0|i07kjr:
ZooKeeper ZOOKEEPER-923

TaskManagement Using Zookeeper Recipe

Task Open Major Unresolved Unassigned kishore gopalakrishna kishore gopalakrishna 08/Nov/10 15:17   08/Nov/10 15:18       recipes   0 0   Any A typical use case in distributed system is " There are T tasks and P processes running but only T processes must be active always [ P > T ] and remaining P-T processes acting as stand by and be ready to take up a Task with one or more active processes fail".

Zookeeper provides an excellent service which can be used to co ordinate among P processes and using the mechanism of locking we can ensure that there is always T processes active. Without a central co-ordinating service generally there will be 2T processes[ i.e atleast one back up for each process]. With Zookeeper we can decide P based on the failure rate.

The assumption here are
1. At any time we have P > T. P can be chosen appropriately based on failure rate.
2. The tasks are stateless. That is any process P_i that takes up a task T_j does not know the state of the process P_k which previously processed T_j. This is not entirely true and there are ways to over come this draw back on a case by case basis.


This was developed for a different project S4 which is also open sourced http://s4.io/. The communication layer and task management layer is completely independent of S4 and can be used in any application.



36649 No Perforce job exists for this issue. 0 42097
9 years, 20 weeks, 3 days ago 0|i07kjz:
ZooKeeper ZOOKEEPER-922

enable faster timeout of sessions in case of unexpected socket disconnect

Improvement Open Major Unresolved Camille Fournier Camille Fournier Camille Fournier 08/Nov/10 10:43   05/Feb/20 07:15     3.7.0, 3.5.8 server   2 9   In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC.

I propose doing this by setting the timeout associated with the crashed session to "minSessionTimeout".
70759 No Perforce job exists for this issue. 1 42098
9 years, 7 weeks, 1 day ago 0|i07kk7:
ZooKeeper ZOOKEEPER-921

zkPython incorrectly checks for existence of required ACL elements

Bug Closed Major Fixed Nicholas Knight Nicholas Knight Nicholas Knight 08/Nov/10 03:51   23/Nov/11 14:22 28/Dec/10 19:46 3.3.1, 3.4.0 3.3.3, 3.4.0 contrib-bindings   0 2   Mac OS X 10.6.4, included Python 2.6.1 Calling {{zookeeper.create()}} seems, under certain circumstances, to be corrupting a subsequent call to Python's {{logging}} module.

Specifically, if the node does not exist (but its parent does), I end up with a traceback like this when I try to make the logging call:

{noformat}
Traceback (most recent call last):
File "zktest.py", line 21, in <module>
logger.error("Boom?")
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py", line 1046, in error
if self.isEnabledFor(ERROR):
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py", line 1206, in isEnabledFor
return level >= self.getEffectiveLevel()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py", line 1194, in getEffectiveLevel
while logger:
TypeError: an integer is required
{noformat}

But if the node already exists, or the parent does not exist, I get the appropriate NodeExists or NoNode exceptions.

I'll be attaching a test script that can be used to reproduce this behavior.
47547 No Perforce job exists for this issue. 2 32809
9 years, 13 weeks, 1 day ago
Reviewed
0|i05z8v:
ZooKeeper ZOOKEEPER-920

L7 (application layer) ping support

New Feature Open Minor Unresolved Chang Song Chang Song Chang Song 08/Nov/10 02:53   05/Feb/20 07:16   3.3.1 3.7.0, 3.5.8 c client   1 4   Zookeeper is used in applications where fault tolerance is important. Its client i/o thread send/recv heartbeats to/fro Zookeeper ensemble to stay connected. However healthy heartbeat does not always means that the application that uses Zookeeper client is in good health, it only means that ZK client thread is in good health.

This I needed something that can tagged onto Zookeeper ping that represents L7 (application) health as well.
I have modified C client source to support this in minimal way. I am new to Zookeeper, so please code review this code. I am actually using this code in our in-house solution.

https://github.com/tru64ufs/zookeeper/commit/2196d6d5114a2fd2c0a3bc9a55f4494d47d2aece

Thank you very much.

70773 No Perforce job exists for this issue. 1 42099
7 years, 1 week, 2 days ago 0|i07kkf:
ZooKeeper ZOOKEEPER-919

Ephemeral nodes remains in one of ensemble after deliberate SIGKILL

Bug Closed Blocker Duplicate Unassigned Chang Song Chang Song 04/Nov/10 09:43   23/Nov/11 14:22 18/Nov/11 20:11 3.3.1 3.3.3, 3.4.0 server   0 2   Linux CentOS 5.3 64bit, JDK 1.6.0-22
SLES 11
I was testing stability of Zookeeper ensemble for production deployment. Three node ensemble cluster configuration.
In a loop, I kill/restart three Zookeeper clients that created one ephemeral node each, and at the same time,
I killed Java process on one of ensemble (dont' know if it was a leader or not). Then I restarted Zookeeper on the server,

It turns out that on two zookeeper ensemble servers, all the ephemeral nodes are gone (it should), but on the newly started
Zookeeper server, the two old ephemeral nodes stayed. The zookeeper didn't restart in standalone mode since new ephemeral
nodes gets created on all ensemble servers.
I captured the log.


2010-11-04 17:48:50,201 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:NIOServerCnxn$Factory@250] - Accepted socket connection from /10.25.131.21:11191
2010-11-04 17:48:50,202 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:NIOServerCnxn@776] - Client attempting to establish new session at /10.25.131.21:11191
2010-11-04 17:48:50,203 - INFO [CommitProcessor:1:NIOServerCnxn@1579] - Established session 0x12c160c31fc000b with negotiated timeout 30000 for client /10.25.131.21:11191
2010-11-04 17:48:50,206 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:NIOServerCnxn@633] - EndOfStreamException: Unable to read additional data from client sessionid 0x12c160c31fc000b, likely client has closed socket
2010-11-04 17:48:50,207 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:NIOServerCnxn@1434] - Closed socket connection for client /10.25.131.21:11191 which had sessionid 0x12c160c31fc000b
2010-11-04 17:48:50,207 - ERROR [CommitProcessor:1:NIOServerCnxn@444] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:417)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1508)
at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73)
47548 No Perforce job exists for this issue. 4 32810
9 years, 9 weeks, 1 day ago 0|i05z93:
ZooKeeper ZOOKEEPER-917

Leader election selected incorrect leader

Bug Resolved Critical Not A Problem Unassigned Alexandre Hardy Alexandre Hardy 03/Nov/10 08:33   18/Nov/11 20:01 04/Nov/10 08:40 3.2.2   leaderElection, server   0 1   Cloudera distribution of zookeeper (patched to never cache DNS entries)
Debian lenny
We had three nodes running zookeeper:
* 192.168.130.10
* 192.168.130.11
* 192.168.130.14

192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off).

DNS entries were updated for the new node to allow all the zookeeper servers to find the new node.

The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid.

This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet.
214202 No Perforce job exists for this issue. 1 32811
9 years, 20 weeks, 2 days ago 0|i05z9b:
ZooKeeper ZOOKEEPER-916

Problem receiving messages from subscribed channels in c++ client

Bug Resolved Major Fixed Ivan Kelly Ivan Kelly Ivan Kelly 03/Nov/10 05:10   05/Nov/10 06:52 05/Nov/10 02:45     contrib-hedwig   0 1   We see this bug with receiving messages from a subscribed channel. This problem seems to happen with larger messages. The flow is to first read at least 4 bytes from the socket channel. Extract the first 4 bytes to get the message size. If we've read enough data into the buffer already, we're done so invoke the messageReadCallbackHandler passing the channel and message size. If not, then do an async read for at least the remaining amount of bytes in the message from the socket channel. When done, invoke the messageReadCallbackHandler.

The problem seems that when the second async read is done, the same sizeReadCallbackHandler is invoked instead of the messageReadCallbackHandler. The result is that we then try to read the first 4 bytes again from the buffer. This will get a random message size and screw things up. I'm not sure if it's an incorrect use of the boost asio async_read function or we're doing the boost bind to the callback function incorrectly.


101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0,512 channel(0x80b7a18)
101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 512 channel(0x80b7a18)
101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 599, currently in buffer 508 channel(0x80b7a18)
101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 91 from channel(0x80b7a18)
101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0, 91 channel(0x80b7a18)
101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 599 channel(0x80b7a18)
101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 134287360, currently in buffer 595 channel(0x80b7a18)
101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 134286765 from channel(0x80b7a18)
47549 No Perforce job exists for this issue. 1 32812
9 years, 20 weeks, 6 days ago
Reviewed
0|i05z9j:
ZooKeeper ZOOKEEPER-915

Errors that happen during sync() processing at the leader do not get propagated back to the client.

Bug In Progress Major Unresolved gaoshu Benjamin Reed Benjamin Reed 28/Oct/10 18:43   05/Feb/20 07:11     3.7.0, 3.5.8     1 3 0 600   If an error in sync() processing happens at the leader (SESSION_MOVED for example), they are not propagated back to the client. 100% 100% 600 0 pull-request-available 36650 No Perforce job exists for this issue. 0 32813
2 years, 29 weeks, 1 day ago 0|i05z9r:
ZooKeeper ZOOKEEPER-914

QuorumCnxManager blocks forever

Bug Resolved Blocker Duplicate Vishal Kher Vishal Kher Vishal Kher 27/Oct/10 15:54   12/Nov/10 17:47 12/Nov/10 17:47     leaderElection   0 1   This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect()

"Thread-3" prio=10 tid=0x00007fa920005800 nid=0x11bb runnable [0x00007fa9275ed000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:206)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
- locked <0x00007fa93315f988> (a java.lang.Object)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501)

I had pointed out this bug along with several other problems in QuorumCnxManager earlier in
https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822.

I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon.

The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection().

Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager.
214201 No Perforce job exists for this issue. 0 32814
9 years, 19 weeks, 6 days ago 0|i05z9z:
ZooKeeper ZOOKEEPER-913

Version parser fails to parse "3.3.2-dev" from build.xml.

Bug Closed Critical Fixed Patrick D. Hunt Anthony Urso Anthony Urso 26/Oct/10 02:50   23/Nov/11 14:22 27/Jan/11 02:45 3.3.1 3.3.3, 3.4.0 build   0 2   Cannot build 3.3.1 from release tarball do to VerGen parser inability to parse "3.3.2-dev".

version-info:
[java] All version-related parameters must be valid integers!
[java] Exception in thread "main" java.lang.NumberFormatException: For input string: "2-dev"
[java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
[java] at java.lang.Integer.parseInt(Integer.java:481)
[java] at java.lang.Integer.parseInt(Integer.java:514)
[java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131)
[java] Java Result: 1
47550 No Perforce job exists for this issue. 4 32815
9 years, 9 weeks ago 0|i05za7:
ZooKeeper ZOOKEEPER-912

ZooKeeper client logs trace and debug messages at level INFO

Improvement Patch Available Minor Unresolved Michi Mutsuzaki Anthony Urso Anthony Urso 26/Oct/10 02:42   05/Feb/20 07:11   3.3.1 3.7.0, 3.5.8 java client   1 2   ZK logs a lot of uninformative trace and debug messages to level INFO. This fuzzes up everything and makes it easy to miss useful log info. 70775 No Perforce job exists for this issue. 2 42101
3 years, 39 weeks, 2 days ago 0|i07kkv:
ZooKeeper ZOOKEEPER-911

move operations from methods to individual classes

New Feature Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 21/Oct/10 11:02   05/Feb/20 07:16     3.7.0, 3.5.8 java client   0 3   Copied from my email to the ZK dev list from 2010/05/26:

For my current code I'm using zkclient[1] and have also looked at cages[2] for
some ZK usage examples. I observed, that there's a common pattern to wrap ZK
operations in callables and feed them to a "retryUntilConnected" executor.

Now my idea is, that ZK should already come with operations in classes, e.g.:

o.a.z.operation.Create extends Operation implements callable{

private path, data[], acl, createMode

public Create( .. all kind of ctors .. )

public call(){
.. move code from Zookeeper.create() here
}
}

Similiar classes should be provided for getChildren, delete, exists, getData,
getACL, setACL and setData.

One could then feed such operations to an ZkExecutor, which has the necessary
knowledge about the ZkConnection and can execute a command either
synchronously or asynchronously.

One could also wrap operations in an ExceptionCatcher to ignore certain
Exceptions or in a RetryPolicy.

This is only an idea so far, but I wanted to share my thoughts before starting
to try it out. (BTW: You can meet me at BerlinBuzzwords.de)

[1] http://github.com/sgroschupf/zkclient
[2] http://code.google.com/p/cages/

And a reply from Patrick Hunt at my mail:

Hi Thomas, you might take a look at this JIRA
https://issues.apache.org/jira/browse/ZOOKEEPER-679

there's definitely been interest in this area, however there are some
real challenges as well. Most users do end up wrapping the basic api
with some code, esp the "retry" metaphor is a common case, so I think it
would be valuable. At the same time getting the semantics right is hard
(and covering all the corner cases). Perhaps you could sync up with
Aaron/Chris, I'd personally like to see this go into contrib, but I
understand the extra burden the patch process presents -- it may make
more sense to rapidly iterate on something like github and then move to
contrib once you have something less frequently changing, where the
patch issue would be less of a problem (see 679, there's discussion on
this there). Regardless which way you take it we'd be happy to work with
you.
70777 No Perforce job exists for this issue. 1 42102
9 years, 5 weeks, 1 day ago 0|i07kl3:
ZooKeeper ZOOKEEPER-910

ZOOKEEPER-835 Use SelectionKey.isXYZ() methods instead of complicated binary logic

Sub-task Patch Available Minor Unresolved Michi Mutsuzaki Thomas Koch Thomas Koch 21/Oct/10 09:43   05/Feb/20 07:12     3.7.0, 3.5.8 server   0 0   The SelectionKey class provides methods to replace something like this

(k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0

with

selectionKey.isReadable() || selectionKey.isWritable()

It may be possible, that the first version saves a CPU cycle or two, but the later version saves developer brain cycles which are much more expensive.

I suppose that there are many more places in the server code where this replacement could be done. I propose that whoever touches a code line like this should make the replacement.

70743 No Perforce job exists for this issue. 1 42103
1 year, 45 weeks ago 0|i07klb:
ZooKeeper ZOOKEEPER-909

ZOOKEEPER-823 Extract NIO specific code from ClientCnxn

Sub-task Closed Major Fixed Thomas Koch Thomas Koch Thomas Koch 21/Oct/10 09:26   23/Nov/11 14:22 10/Nov/10 17:40   3.4.0 java client   0 1   This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night.

It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases.

You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx!

Update: Until now, I've collected 8 successful builds in a row!
47551 No Perforce job exists for this issue. 6 33371
9 years, 20 weeks, 1 day ago
Reviewed
netty 0|i062pr:
ZooKeeper ZOOKEEPER-908

ZOOKEEPER-835 Remove code duplication and inconsistent naming in ClientCnxn.Packet creation

Sub-task Closed Minor Fixed Thomas Koch Thomas Koch Thomas Koch 21/Oct/10 06:02   23/Nov/11 14:22 11/Nov/10 12:14   3.4.0 server   0 0   rename record -> request (since their is a counterpart record named "response")
rename header -> requestHeader (to distinguish from responseHeader)

remove ByteBuffer creation code from primeConnection() method and use the duplicate code in the Packet constructor. Therefor the Bytebuffer bb parameter could also be removed from the constructor's parameters.
47552 No Perforce job exists for this issue. 1 33372
9 years, 19 weeks, 6 days ago
Reviewed
0|i062pz:
ZooKeeper ZOOKEEPER-907

Spurious "KeeperErrorCode = Session moved" messages

Bug Closed Blocker Fixed Vishal Kher Vishal Kher Vishal Kher 20/Oct/10 14:27   23/Nov/11 14:22 04/Nov/10 12:29 3.3.1 3.3.2, 3.4.0     0 3   The sync request does not set the session owner in Request.

As a result, the leader keeps printing:
2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:PrepRequestProcessor@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa90000 type:sync: cxid:0x6 zxid:0xfffffffffffffffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved
47553 No Perforce job exists for this issue. 2 32816
9 years, 20 weeks, 6 days ago
Reviewed
0|i05zaf:
ZooKeeper ZOOKEEPER-906

Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client

Improvement Open Major Unresolved Radu Marin Radu Marin Radu Marin 19/Oct/10 21:45   05/Feb/20 07:16   3.3.1 3.7.0, 3.5.8 c client   0 3 86400 86400 0% Currently, when a C client get disconnected, it retries a couple of hosts (not all) with no delay between attempts and then if it doesn't succeed it sleeps for 1/3 session expiration timeout period before trying again.
In the worst case the disconnect event can occur after 2/3 of session expiration timeout has past, and sleeping for even more 1/3 session timeout will cause a session loss in most of the times.

A better approach is to check all hosts but with random delay between reconnect attempts. Also the delay must be independent of session timeout so if we increase the session timeout we also increase the number of available attempts.

This improvement covers the case when the C client experiences network problems for a short period of time and is not able to reach any zookeeper hosts.
Java client already uses this logic and works very good.
0% 0% 86400 86400 67887 No Perforce job exists for this issue. 1 42104
1 year, 7 weeks, 2 days ago zookeeper c-client 0|i07klj:
ZooKeeper ZOOKEEPER-905

enhance zkServer.sh for easier zookeeper automation-izing

Improvement Closed Minor Fixed Nicholas Harteau Nicholas Harteau Nicholas Harteau 19/Oct/10 17:48   23/Nov/11 14:22 07/Dec/10 14:15   3.4.0 scripts   0 0   zkServer.sh is good at starting zookeeper and figuring out the right options to pass along.

unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there.

the attached patch addresses a couple simple issues:
1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper

2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications

3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing

4. communicate extra info ("JMX enabled") about zookeeper on STDERR rather than STDOUT (necessary for #2)

5. fixes an issue on macos where readlink doesn't have the '-f' option.

47554 No Perforce job exists for this issue. 1 33373
9 years, 16 weeks ago
Reviewed
0|i062q7:
ZooKeeper ZOOKEEPER-904

super digest is not actually acting as a full superuser

Bug Closed Major Fixed Camille Fournier Camille Fournier Camille Fournier 19/Oct/10 16:44   23/Nov/11 14:22 26/Oct/10 18:31 3.3.1 3.3.2, 3.4.0 server   0 2   The documentation states:
New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a "super" user. In particular no ACL checking occurs for a user authenticated as super.

However, if a super user does something like:
zk.setACL("/", Ids.READ_ACL_UNSAFE, -1);

the super user is now bound by read-only ACL. This is not what I would expect to see given the documentation. It can be fixed by moving the chec for the "super" authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) loop.
47555 No Perforce job exists for this issue. 2 32817
9 years, 22 weeks, 1 day ago
Reviewed
0|i05zan:
ZooKeeper ZOOKEEPER-903

Create a testing jar with useful classes from ZK test source

Improvement Resolved Major Implemented Unassigned Camille Fournier Camille Fournier 18/Oct/10 14:25   09/Oct/13 20:30 09/Oct/13 20:30     tests   0 0   From mailing list:
-----Original Message-----
From: Benjamin Reed
Sent: Monday, October 18, 2010 11:12 AM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Testing zookeeper outside the source distribution?

we should be exposing those classes and releasing them as a testing
jar. do you want to open up a jira to track this issue?

ben

On 10/18/2010 05:17 AM, Anthony Urso wrote:
> Anyone have any pointers on how to test against ZK outside of the
> source distribution? All the fun classes (e.g. ClientBase) do not make
> it into the ZK release jar.
>
> Right now I am manually running a ZK node for the unit tests to
> connect to prior to running my test, but I would rather have something
> that ant could reliably
> automate starting and stopping for CI.
>
> Thanks,
> Anthony
36651 No Perforce job exists for this issue. 0 42105
6 years, 24 weeks, 1 day ago 0|i07klr:
ZooKeeper ZOOKEEPER-902

Fix findbug issue in trunk "Malicious code vulnerability"

Bug Closed Minor Fixed Flavio Paiva Junqueira Patrick D. Hunt Patrick D. Hunt 18/Oct/10 13:41   23/Nov/11 14:21 07/Feb/11 14:27 3.4.0 3.4.0 quorum, server   0 2   https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/970/artifact/trunk/findbugs/zookeeper-findbugs-report.html#Warnings_MALICIOUS_CODE

Malicious code vulnerability Warnings

Code Warning
MS org.apache.zookeeper.server.quorum.LeaderElection.epochGen isn't final but should be
47556 No Perforce job exists for this issue. 9 32818
9 years, 7 weeks, 3 days ago 0|i05zav:
ZooKeeper ZOOKEEPER-901

Redesign of QuorumCnxManager

Improvement Open Major Unresolved Michael Han Flavio Paiva Junqueira Flavio Paiva Junqueira 17/Oct/10 09:09   14/Dec/19 06:08   3.3.1 3.7.0 leaderElection   0 10   QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following:

# Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers;
# Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector.
70771 No Perforce job exists for this issue. 0 42106
2 years, 51 weeks ago 0|i07klz:
ZooKeeper ZOOKEEPER-900

FLE implementation should be improved to use non-blocking sockets

Improvement Open Major Unresolved Martin Kuchta Vishal Kher Vishal Kher 15/Oct/10 10:07   14/Dec/19 06:06     3.7.0     3 14   ZOOKEEPER-932, ZOOKEEPER-933, ZOOKEEPER-934 From earlier email exchanges:
1. Blocking connects and accepts:

a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective peer.

b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify.

Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822
170 No Perforce job exists for this issue. 4 32819
1 year, 25 weeks, 6 days ago 0|i05zb3:
ZooKeeper ZOOKEEPER-899

Update Netty version in trunk to 3.2.2

Task Resolved Major Fixed Thomas Koch Thomas Koch Thomas Koch 15/Oct/10 08:49   17/Sep/11 06:56 16/Sep/11 20:09   3.5.0 build   0 1   The patch for ZOOKEEPER-823 already has netty version 3.2.1.Final while trunk has still 3.1.5.GA. Could you please update the netty version in trunk so that we can rule out the version difference a s a cause for the failures? Note that the most recent version of netty is already 3.2.2, not 3.2.1 as in ZOOKEEPER-823

- <dependency org="org.jboss.netty" name="netty" conf="default" rev="3.1.5.GA">
+ <dependency org="org.jboss.netty" name="netty" conf="default" rev="3.2.1.Final">
34432 No Perforce job exists for this issue. 4 33374
8 years, 27 weeks, 5 days ago
Reviewed
netty,maven 0|i062qf:
ZooKeeper ZOOKEEPER-898

C Client might not cleanup correctly during close

Bug Closed Trivial Fixed Jared Cantwell Jared Cantwell Jared Cantwell 14/Oct/10 15:35   23/Nov/11 14:22 28/Oct/10 14:51   3.3.2, 3.4.0 c client   0 1   I was looking through the c-client code and noticed a situation where a counter can be incorrectly incremented and a small memory leak can occur.

In zookeeper.c : add_completion(), if close_requested is true, then the completion will not be queued. But at the end, outstanding_sync is still incremented and free() never called on the newly allocated completion_list_t.

I will submit for review a diff that I believe corrects this issue.
47557 No Perforce job exists for this issue. 2 32820
9 years, 22 weeks ago
Reviewed
0|i05zbb:
ZooKeeper ZOOKEEPER-897

C Client seg faults during close

Bug Closed Major Fixed Jared Cantwell Jared Cantwell Jared Cantwell 14/Oct/10 15:26   23/Nov/11 14:22 28/Oct/10 12:25   3.3.2, 3.4.0 c client   0 1   We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call.

#0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969
#1 0x000000000046234e in check_events (zh=0x6bd480, events=<value optimized out>) at src/zookeeper.c:1687
#2 0x0000000000462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971
#3 0x0000000000469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311
#4 0x00007ffff7bc59ca in start_thread () from /lib/libpthread.so.0
#5 0x00007ffff6f706fd in clone () from /lib/libc.so.6
#6 0x0000000000000000 in ?? ()

We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it:

1. do_io() call check_events()
2. if(events&ZOOKEEPER_READ) branch executes
3. if (rc > 0) branch executes
4. if (zh->input_buffer != &zh->primer_buffer) branch executes
.....in the meantime......
5. zookeeper_close() called
6. if (inc_ref_counter(zh,0)!=0) branch executes
7. cleanup_bufs() is called
8. input_buffer is freed at the end
..... back to check_events().........
9. queue_events() is called on a NULL buffer.

I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change.
47558 No Perforce job exists for this issue. 2 32821
9 years, 22 weeks ago
Reviewed
0|i05zbj:
ZooKeeper ZOOKEEPER-896

Improve client to support dynamic authentication schemes

Improvement Patch Available Major Unresolved Botond Hejj Botond Hejj Botond Hejj 14/Oct/10 08:57   05/Feb/20 07:11     3.7.0, 3.5.8 c client, java client   1 6   When we started exploring zookeeper for our requirements we found the authentication mechanism is not flexible enough.
We want to use kerberos for authentication but using the current API we ran into a few problems. The idea is that we get a kerberos token on the client side and than send that token to the server with a kerberos scheme. A server side authentication plugin can use that token to authenticate the client and also use the token for authorization.
We ran into two problems with this approach:
1. A different kerberos token is needed for each different server that client can connect to since kerberos uses mutual authentication. That means when the client acquires this kerberos token it has to know which server it connects to and generate the token according to that. The client currently can't generate a token for a specific server. The token stored in the auth_info is used for all the servers.
2. The kerberos token might have an expiry time so if the client loses the connection to the server and than it tries to reconnect it should acquire a new token. That is not possible currently since the token is stored in auth_info and reused for every connection.

The problem can be solved if we allow the client to register a callback for authentication instead a static token. This can be a callback with an argument which passes the current host string. The zookeeper client code could call this callback before it sends the authentication info to the server to get a fresh server specific token.

This would solve our problem with the kerberos authentication and also could be used for other more dynamic authentication schemes.
70787 No Perforce job exists for this issue. 6 42107
3 years, 31 weeks, 2 days ago 0|i07km7:
ZooKeeper ZOOKEEPER-895

ClientCnxn.authInfo must be thread safe

Bug Resolved Major Fixed Unassigned Thomas Koch Thomas Koch 14/Oct/10 03:25   19/Nov/10 12:40 19/Nov/10 12:40         0 1   authInfo can be accessed concurrently by different Threads, as exercised in
org.apache.zookeeper.test.ACLTest

The two concurrent access points in this case were (presumably):
org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:805) and
org.apache.zookeeper.ClientCnxn.addAuthInfo(ClientCnxn.java:1121)

The line numbers refer to the latest patch in ZOOKEEPER-823.

The exception that pointed to this issue:
[junit] 2010-10-13 09:35:55,113 [myid:] - WARN [main-SendThread(localhost:11221):ClientCnxn$SendThread@713] - Session 0x0 for server localhost/127.0.0.1:11221, unexpected error, closing socket connection and attempting reconnect
[junit] java.util.ConcurrentModificationException
[junit] at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
[junit] at java.util.AbstractList$Itr.next(AbstractList.java:343)
[junit] at org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:805)
[junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:247)
[junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:694)

Proposed solution: Use a thread save list for authInfo
47559 No Perforce job exists for this issue. 0 32822
9 years, 18 weeks, 6 days ago 0|i05zbr:
ZooKeeper ZOOKEEPER-894

ZOOKEEPER-835 add Package o.a.zookeeper.client

Sub-task Open Major Unresolved Unassigned Thomas Koch Thomas Koch 13/Oct/10 05:51   13/Oct/10 13:24           0 1   I'd like to move classes that are not part of the API but belong to the ZK Client into a separate Client package. These classes are:

- Inner classes that should become normal classes:
Zookeeper.ZkWatchManager
Zookeeper.WatchRegistration
ClientCnxn.SendThread (should become a Runnable anyhow)
ClientCnxn.EventThread
ClientCnxn.Package
ClientCnxn.AuthData ?

- Classes now in the zookeeper package:
ClientCnxn -> Client.Cnxn
ClientCnxnSocket* -> Client.CnxnSocket*
... Maybe some others that can be moved without breaking the API

- Classes yet to be written:
PendingQueue ?
OutgoingQueue ?
36652 No Perforce job exists for this issue. 0 42108
9 years, 24 weeks, 1 day ago 0|i07kmf:
ZooKeeper ZOOKEEPER-893

ZooKeeper high cpu usage when invalid requests

Bug Closed Critical Fixed Thijs Terlouw Thijs Terlouw Thijs Terlouw 11/Oct/10 03:15   23/Nov/11 14:22 19/Oct/10 18:38 3.3.1 3.3.2, 3.4.0 server   0 3 3600 3600 0% Linux 2.6.16
4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues.

from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java

the two affected parts:
===========
int length = msgLength.getInt();
if(length <= 0) {
throw new IOException("Invalid packet length:" + length);
}
===========


===========
while (message.hasRemaining()) {
temp_numbytes = channel.read(message);
if(temp_numbytes < 0) {
throw new IOException("Channel eof before end");
}
numbytes += temp_numbytes;
}
===========

how to replicate this bug:

perform an nmap portscan against your zookeeper server: "nmap -sV -n your.ip.here -p4181"
wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore
0% 0% 3600 3600 47560 No Perforce job exists for this issue. 3 32823
9 years, 23 weeks, 1 day ago
Reviewed
zookeeper server cpu ZOOKEEPER-427 0|i05zbz:
ZooKeeper ZOOKEEPER-892

Remote replication of Zookeeper data

New Feature Open Major Unresolved Anirban Roy Anirban Roy Anirban Roy 08/Oct/10 06:38   14/Dec/19 06:08   3.4.0 3.7.0 server 15/May/11 3 18 9676800 9676800 0% [root@llf531123 Zookeeper]# uname -a
Linux llf531123.crawl.yahoo.net 2.6.9-67.0.22.ELsmp #1 SMP Fri Jul 11 10:37:57 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[root@llf531123 Zookeeper]# java -version
java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode)
[root@llf531123 Zookeeper]#
ZooKeeper is a highly available and scalable system for distributed synchrony and is frequently used for cluster management. In its current incarnation it has issues with communication and data replication across extended geographic locations. Presently, the only way to distribute ZooKeeper across multiple data centers is to maintain a cross-colo Quorum using Observer members, leading to unnecessary consumption of bandwidth and performance impacts. As the title suggests, this work aims to to provide replication of ZooKeeper data from one site to others using a new type of ZooKeeper member called a Publisher. The broad idea is to have a complete instance of the current ZooKeeper at each geographic location in a master-slave setup. The Publisher will be a part of the Master ZooKeeper Site and will push changes to a FIFO queue and make it available to any interested client. The slave ZooKeeper runs client application called Replicator which receives and replays the changes to slave instance. Multiple slave Replicators can subscribes to the master Publisher and receive changes with guaranteed ordering. It will be asynchronous, non-intrusive, loosely-coupled and can be applied to a subset of the data. This scheme will bring about many of the benefits of database replication such as resilience to site failure and localized serving across data centers. In short, the goal is to provide remote (sub-tree) data replication with guaranteed ordering, without affecting the Master ZooKeeper performance. 0% 0% 9676800 9676800 37 No Perforce job exists for this issue. 3 42109
6 years, 45 weeks, 2 days ago ZOOKEEPER-892. Remote replication of ZooKeeper data (Anirban Roy) zkrepl replication zoorepl 0|i07kmn:
ZooKeeper ZOOKEEPER-891

Allow non-numeric version strings

Improvement Closed Minor Duplicate Unassigned Eli Collins Eli Collins 07/Oct/10 21:38   23/Nov/11 14:22 09/Nov/10 18:17   3.4.0 build   0 0   Non-numeric version strings (eg -dev) or -are not currently accepted, you either get an error (Invalid version number format, must be "x.y.z") or if you pass x.y.z-dev or x.y.z+1 you'll get a NumberFormatException. Would be useful to allow non-numeric versions.

{noformat}
version-info:
[java] All version-related parameters must be valid integers!
[java] Exception in thread "main" java.lang.NumberFormatException: For input string: "3-dev"
[java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
[java] at java.lang.Integer.parseInt(Integer.java:458)
[java] at java.lang.Integer.parseInt(Integer.java:499)
[java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131)
[java] Java Result: 1
{noformat}
214200 No Perforce job exists for this issue. 0 33375
9 years, 20 weeks, 2 days ago 0|i062qn:
ZooKeeper ZOOKEEPER-890

C client invokes watcher callbacks multiple times

Bug Resolved Critical Not A Problem Unassigned Austin Bennett Austin Bennett 07/Oct/10 03:53   13/Oct/10 13:11 13/Oct/10 13:11 3.3.1   c client   0 0   Mac OS X 10.6.5 Code using the C client assumes that watcher callbacks are called exactly once. If the watcher is called more than once, the process will likely overwrite freed memory and/or crash.

collect_session_watchers (zk_hashtable.c) gathers watchers from active_node_watchers, active_exist_watchers, and active_child_watchers without removing them. This results in watchers being invoked more than once.

Test code is attached that reproduces the bug, along with a proposed patch.
214199 No Perforce job exists for this issue. 2 32824
9 years, 24 weeks, 1 day ago 0|i05zc7:
ZooKeeper ZOOKEEPER-889

pyzoo_aget_children crashes due to incorrect watcher context

Bug Resolved Critical Fixed Unassigned Austin Bennett Austin Bennett 07/Oct/10 00:16   07/Oct/10 00:19 07/Oct/10 00:19 3.3.1   contrib-bindings   0 1   OS X 10.6.5, Python 2.6.1 The pyzoo_aget_children function passes the completion callback ("pyw") in place of the watcher callback ("get_pyw"). Since it is a one-shot callback, it is deallocated after the completion callback fires, causing a crash when the watcher callback should be invoked.
47561 No Perforce job exists for this issue. 1 32825
9 years, 25 weeks ago 0|i05zcf:
ZooKeeper ZOOKEEPER-888

c-client / zkpython: Double free corruption on node watcher

Bug Closed Critical Fixed Lukas Lukas Lukas 06/Oct/10 09:26   23/Nov/11 14:22 19/Oct/10 15:02 3.3.1 3.3.2, 3.3.3, 3.4.0 c client, contrib-bindings   1 3   the c-client / zkpython wrapper invokes already freed watcher callback

steps to reproduce:
0. start a zookeper server on your machine
1. run the attached python script
2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` )
3. wait until the connection and the node observer fired with a session event
4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` )

-> the client tries to dispatch the node observer function again, but it was already freed -> double free corruption
47562 No Perforce job exists for this issue. 3 32826
9 years, 23 weeks, 1 day ago
Reviewed
0|i05zcn:
ZooKeeper ZOOKEEPER-887

Bug at - Producer-Consumer Example

Bug Open Minor Unresolved Unassigned sanjivsingh sanjivsingh 06/Oct/10 00:26   08/Sep/16 02:07       java client   1 2   I tried to test Producer-Consumer Example published at ...
http://hadoop.apache.org/zookeeper/docs/r3.0.0/zookeeperTutorial.html

Queue.produce( int p) working correctly,,,

there is problem in Queue.consume( ) method.

int consume() throws KeeperException, InterruptedException{
int retvalue = -1;
Stat stat = null;

// Get the first element available
while (true) {
synchronized (mutex) {
List<String> list = zk.getChildren(root, true);
if (list.size() == 0) {
System.out.println("Going to wait");
mutex.wait();
} else {
Integer min = new
Integer(list.get(0).substring(7));
for(String s : list){
Integer tempValue = new
Integer(s.substring(7));
//System.out.println("Temporary value: " +
tempValue);
if(tempValue < min) min = tempValue;
}
System.out.println("Temporary value: " + root
+ "/element" + min);
byte[] b = zk.getData(root + "/element" + min,
false, stat);
zk.delete(root + "/element" + min, 0);
ByteBuffer buffer = ByteBuffer.wrap(b);
retvalue = buffer.getInt();

return retvalue;
}
}
}
}

wat exactly produce( ) doing is that add child under root like
element000000001,
element000000002 ,element000000003 etc....

but
In consume( ) method ,
1. Integer min = new Integer(list.get(0).substring(7));
2. for(String s : list){
3. Integer tempValue = new
Integer(s.substring(7));
4. if(tempValue < min) min = tempValue;
5. }
6. byte[] b = zk.getData(root + "/element" + min,
false, stat);
7. zk.delete(root + "/element" + min, 0);

bcuz of..
line 1 & 3 .. converting like String 000000001 --------->
Interger 1
and bcuz of this , in line 6 & 7

It is tring to access znode like at root + "/element1" rather
than root + "/element000000001"
that is definelty no-existing one..........

I m putting forward a solution....

int consume() throws KeeperException, InterruptedException{
int retvalue = -1;
Stat stat = null;

// Get the first element available
while (true) {
synchronized (mutex) {

List<String> list = zk.getChildren(root, true);
if (list.size() == 0) {
System.out.println("Going to wait");
mutex.wait();
} else {
Integer min = new
Integer(list.get(0).substring(7));

int i=0 ,p=0;
for(String s : list){
Integer tempValue = new
Integer(s.substring(7));
if(tempValue < min)
p=i;
i++;
}

byte[] b = zk.getData(root + "/element" +
list.get(p).substring(7), false, stat);
zk.delete(root + "/element" +
list.get(p).substring(7), 0);
ByteBuffer buffer = ByteBuffer.wrap(b);
retvalue = buffer.getInt();

return retvalue;
}
}
}
}
}

36653 No Perforce job exists for this issue. 1 32827
3 years, 28 weeks ago 0|i05zcv:
ZooKeeper ZOOKEEPER-886

Hedwig Server stays in "disconnected" state when connection to ZK dies but gets reconnected

Bug Resolved Major Fixed Erwin Tam Erwin Tam Erwin Tam 05/Oct/10 16:18   12/Oct/10 06:52 11/Oct/10 16:55     contrib-hedwig   0 1   The Hedwig Server is connected to ZooKeeper. In the ZkTopicManager, it registers a watcher so that if it ever gets disconnected from ZK, it will temporarily fail all incoming requests since the Hedwig server does not know for sure if it is still the master for the topics. When the ZK client gets reconnected, the logic currently is wrong and it does not unset the suspended flag. Thus once it gets disconnected, it will stay in the suspended state forever, thereby making the Hedwig server hub dead. 47563 No Perforce job exists for this issue. 1 32828
9 years, 24 weeks, 2 days ago
Reviewed
0|i05zd3:
ZooKeeper ZOOKEEPER-885

Zookeeper drops connections under moderate IO load

Bug Open Major Unresolved Unassigned Alexandre Hardy Alexandre Hardy 01/Oct/10 10:43   14/Dec/19 06:07   3.2.2, 3.3.1 3.7.0 server   4 18   Debian (Lenny)
1Gb RAM
swap disabled
100Mb heap for zookeeper
A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load.

In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command:
{noformat}
dd if=/dev/urandom of=/dev/mapper/nimbula-test
{noformat}

The {{dd}} command transferred data at a rate of about 4Mb/s.

The same thing happens with
{noformat}
dd if=/dev/zero of=/dev/mapper/nimbula-test
{noformat}

It seems strange that such a moderate load should cause instability in the connection.

Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations.

Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case.
36654 No Perforce job exists for this issue. 5 32829
3 years, 50 weeks, 1 day ago 0|i05zdb:
ZooKeeper ZOOKEEPER-884

Remove LedgerSequence references from BookKeeper documentation and comments in tests

Bug Closed Major Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 01/Oct/10 05:11   23/Nov/11 14:22 05/Nov/10 01:18 3.3.1 3.4.0 contrib-bookkeeper   0 1   We no longer use LedgerSequence, so we need to remove references in documentation and comments sprinkled throughout the code. 47564 No Perforce job exists for this issue. 1 32830
9 years, 20 weeks, 6 days ago
Reviewed
0|i05zdj:
ZooKeeper ZOOKEEPER-883

Idle cluster increasingly consumes CPU resources

Bug Resolved Major Implemented Unassigned Lars George Lars George 30/Sep/10 04:38   09/Oct/13 20:33 09/Oct/13 20:33 3.3.1   server   0 1   Monitoring the ZooKeeper nodes by polling the various ports using Nagios' open port checks seems to cause a substantial raise of CPU being used by the ZooKeeper daemons. Over the course of a week an idle cluster grew from a baseline 2% to >10% CPU usage. Attached is a stack dump and logs showing the occupied threads. At the end the daemon starts failing on "too many open files" errors as all handles are used up. 36655 No Perforce job exists for this issue. 1 32831
6 years, 24 weeks, 1 day ago 0|i05zdr:
ZooKeeper ZOOKEEPER-882

Startup loads last transaction from snapshot

Bug Closed Minor Fixed Jared Cantwell Jared Cantwell Jared Cantwell 28/Sep/10 19:46   23/Nov/11 14:22 23/Dec/10 07:43   3.3.3, 3.4.0 server   0 1   On startup, the server first loads the latest snapshot, and then loads from the log starting at the last transaction in the snapshot. It should begin from one past that last transaction in the log. I will attach a possible patch. 47565 No Perforce job exists for this issue. 5 32832
9 years, 13 weeks, 6 days ago 0|i05zdz:
ZooKeeper ZOOKEEPER-881

ZooKeeperServer.loadData loads database twice

Bug Closed Trivial Fixed Jared Cantwell Jared Cantwell Jared Cantwell 28/Sep/10 19:41   23/Nov/11 14:21 18/Oct/10 14:30   3.3.2, 3.4.0 server   0 1   zkDb.loadDataBase() is called twice at the beginning of loadData(). It shouldn't have any negative affects, but is unnecessary. A patch should be trivial. 47566 No Perforce job exists for this issue. 1 32833
9 years, 23 weeks, 3 days ago
Reviewed
0|i05ze7:
ZooKeeper ZOOKEEPER-880

QuorumCnxManager$SendWorker grows without bounds

Bug Closed Blocker Fixed Vishal Kher Jean-Daniel Cryans Jean-Daniel Cryans 27/Sep/10 19:40   23/Nov/11 14:22 16/Mar/11 14:49 3.4.0 3.4.0     0 4   We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like:

{noformat}
tickTime=3000
dataDir=/somewhere_thats_not_tmp
clientPort=2181
initLimit=10
syncLimit=5
server.0=sv4borg9:2888:3888
server.1=sv4borg10:2888:3888
server.2=sv4borg11:2888:3888
server.3=sv4borg12:2888:3888
server.4=sv4borg13:2888:3888
{noformat}

The issue is on the first server. I'm going to attach threads dumps and logs in moment.
47567 No Perforce job exists for this issue. 9 32834
9 years, 1 week, 6 days ago
Reviewed
0|i05zef:
ZooKeeper ZOOKEEPER-879

ZOOKEEPER-835 outgoingQueue should be a class

Sub-task Open Major Unresolved Unassigned Thomas Koch Thomas Koch 23/Sep/10 04:52   23/Sep/10 12:08           0 0   I'm not yet 100% sure about this yet, but it seems reasonable to me.
Currently outgoingQueue is a simple list. Whether additional items can be added to the queue and the logic to add sth to the queue is handled by ClientCnxn.

class OutgoingQueue
- isOpen
+ add(Packet) / offer(Packet)
+ poll() / take()

OutgoingQueue must have knowledge about the state of SendThreat and may only accept additional Packets if SendThread has not yet terminated.
OutgoingQueue knows, when it must call ConnectionLoss on the remaining Packets in its queue.
40361 No Perforce job exists for this issue. 0 42110
9 years, 27 weeks ago 0|i07kmv:
ZooKeeper ZOOKEEPER-878

ZOOKEEPER-835 finishPacket and conLossPacket should be methods of Packet

Sub-task Open Minor Unresolved Thomas Koch Thomas Koch Thomas Koch 23/Sep/10 04:36   05/Feb/20 07:16     3.7.0, 3.5.8 server   0 0   Those methods change the inner state of Packet, work on Packet so they should better be methods of class Packet. This may help to clarify synchronization. 70768 No Perforce job exists for this issue. 2 42111
9 years, 16 weeks, 2 days ago 0|i07kn3:
ZooKeeper ZOOKEEPER-877

zkpython does not work with python3.1

Bug Closed Major Fixed Daniel Enman TuxRacer TuxRacer 22/Sep/10 06:44   13/Mar/14 14:16 08/Oct/13 02:46 3.3.1 3.4.6, 3.5.0 contrib-bindings   0 5   linux+python3.1 as written in the contrib/zkpython/README file:


"Python >= 2.6 is required. We have tested against 2.6. We have not tested against 3.x."

this is probably more a 'new feature' request than a bug; anyway compiling the pythn module and calling it returns an error at load time:


python3.1
Python 3.1.2 (r312:79147, May 8 2010, 16:36:46)
[GCC 4.4.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import zookeeper
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /usr/local/lib/python3.1/dist-packages/zookeeper.so: undefined symbol: PyString_AsString



are there any plan to support Python3.X?

I also tried to write a 3.1 ctypes wrapper but the C API seems in fact to be written in C++, so python ctypes cannot be used.
70733 No Perforce job exists for this issue. 8 2586
6 years, 2 weeks ago
Reviewed
0|i00sq7:
ZooKeeper ZOOKEEPER-876

Unnecessary snapshot transfers between new leader and followers

Bug Resolved Minor Fixed Diogo Diogo Diogo 21/Sep/10 09:37   01/Jul/13 13:28 01/Jul/13 13:28 3.4.0 3.5.0     0 3   When starting a new leadership, unnecessary snapshot transfers happen between new leader and followers. This is so because of multiple small bugs.

1) the comparison of zxids is done based on a new proposal, instead of the last logged zxid. (LearnerHandler.java ~ 297)
2) if follower is one zxid behind, the check of the interval of committed logs excludes the follower. (LearnerHandler.java ~ 277)
3) the bug reported in ZOOKEEPER-874 (commitLogs are empty after recover).
67885 No Perforce job exists for this issue. 4 32835
6 years, 38 weeks, 3 days ago 0|i05zen:
ZooKeeper ZOOKEEPER-875

ResponderThread and udpSocket should be move from QuorumPeer to LeaderElection

Improvement Open Trivial Unresolved Unassigned Diogo Diogo 17/Sep/10 13:15   17/Sep/10 13:15   3.3.1   leaderElection   0 0   Part of the algorithm implemented in the class LeaderElection is inside QuorumPeer. Is there any reason for that? ResponderThread and udpSocket belong to LeaderElection class and should be moved in LeaderElection.java. That would make the code look cleaner. 50553 No Perforce job exists for this issue. 0 42112
9 years, 27 weeks, 6 days ago 0|i07knb:
ZooKeeper ZOOKEEPER-874

FileTxnSnapLog.restore does not call listener

Bug Closed Trivial Fixed Diogo Diogo Diogo 17/Sep/10 12:01   01/May/13 22:29 13/Apr/11 12:10 3.3.1 3.4.0 leaderElection   0 2   FileTxnSnapLog.restore() does not call listener passed as parameter. The result is that the commitLogs list is empty. When a follower connects to the leader, the leader is forced to send a snapshot to the follower instead of a couple of requests and commits. 47568 No Perforce job exists for this issue. 1 32836
8 years, 50 weeks ago leader election 0|i05zev:
ZooKeeper ZOOKEEPER-873

Performance oriented leader election (POLE)

New Feature Open Minor Unresolved Unassigned Diogo Diogo 15/Sep/10 11:57   25/Jul/12 14:46           1 1   Currently, the leader is elected based on the length of its history. In heterogeneous settings, other processes can be better suited to serve as a leader, e.g., the process running on the node with best links to a majority. POLE (Performance Oriented Leader Election) will be a leader election implementation that takes into account multiple factors when selecting the leader.
50554 No Perforce job exists for this issue. 0 42113
7 years, 35 weeks, 1 day ago leader election 0|i07knj:
ZooKeeper ZOOKEEPER-872

Small fixes to PurgeTxnLog

Bug Open Minor Unresolved Vishal Kher Vishal Kher Vishal Kher 14/Sep/10 21:51   05/Feb/20 07:16   3.3.1 3.7.0, 3.5.8     0 1   PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger. 38 No Perforce job exists for this issue. 2 32837
8 years, 13 weeks, 1 day ago 0|i05zf3:
ZooKeeper ZOOKEEPER-871

ClientTest testClientCleanup is failing due to high fd count.

Bug Resolved Blocker Cannot Reproduce Unassigned Mahadev Konar Mahadev Konar 14/Sep/10 18:39   08/Oct/13 18:55 08/Oct/13 18:55         0 0   The fd counts has increased. The tests are repeatedly failing on hudson machines. I probably think this is related to netty server changes. We have to fix this before we release 3.4 70741 No Perforce job exists for this issue. 0 32838
6 years, 24 weeks, 2 days ago 0|i05zfb:
ZooKeeper ZOOKEEPER-870

Zookeeper trunk build broken.

Bug Closed Major Fixed Mahadev Konar Mahadev Konar Mahadev Konar 14/Sep/10 18:16   23/Nov/11 14:22 15/Sep/10 01:57   3.4.0     0 1   the zookeeper current trunk build is broken mostly due to some netty changes. This is causing a huge backlog of PA's and other impediments to the review process. For now I plan to disable the test and fix them as part of 3.4 later. 47569 No Perforce job exists for this issue. 2 32839
9 years, 28 weeks, 1 day ago
Reviewed
0|i05zfj:
ZooKeeper ZOOKEEPER-869

Support for election of leader with arbitrary zxid

New Feature Open Minor Unresolved Unassigned Diogo Diogo 14/Sep/10 03:39   21/Sep/10 09:44           0 0   Currently, the leader election algorithm implemented guarantees that the leader has the maximum zxid of the ensemble. The state synchronization after the election was built based on this assumption. However, other leader elections algorithms might elect leaders with arbitrary zxid.

To support other leader election algorithms, the state synchronization should allow the leader to have an arbitrary zxid.
214198 No Perforce job exists for this issue. 0 42114
9 years, 27 weeks, 2 days ago leader election 0|i07knr:
ZooKeeper ZOOKEEPER-868

ZOOKEEPER-835 Cleanups from ZOOKEEPER-823 patch

Sub-task Open Major Unresolved Unassigned Ivan Kelly Ivan Kelly 08/Sep/10 13:19   08/Sep/10 13:23           0 0   214197 No Perforce job exists for this issue. 0 42115
9 years, 29 weeks, 1 day ago 0|i07knz:
ZooKeeper ZOOKEEPER-867

ClientTest is failing on hudson - fd cleanup

Bug Closed Blocker Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 07/Sep/10 03:29   23/Nov/11 14:22 14/Sep/10 19:22 3.4.0 3.3.2, 3.4.0 tests   0 1   client cleanup test is failing on hudson. fd count is off. 47570 No Perforce job exists for this issue. 1 32840
9 years, 28 weeks, 1 day ago 0|i05zfr:
ZooKeeper ZOOKEEPER-866

Adding no disk persistence option in zookeeper.

New Feature Open Major Unresolved Mahadev Konar Mahadev Konar Mahadev Konar 04/Sep/10 20:12   14/Dec/19 06:06     3.7.0     6 13   Its been seen that some folks would like to use zookeeper for very fine grained locking. Also, in there use case they are fine with loosing all old zookeeper state if they reboot zookeeper or zookeeper goes down. The use case is more of a runtime locking wherein forgetting the state of locks is acceptable in case of a zookeeper reboot. Not logging to disk allows high throughput on and low latency on the writes to zookeeper. This would be a configuration option to set (ofcourse the default would be logging to disk).
67211 No Perforce job exists for this issue. 1 42116
6 years, 20 weeks, 1 day ago 0|i07ko7:
ZooKeeper ZOOKEEPER-865

Runaway thread

Bug Open Critical Unresolved Unassigned Stephen McCants Stephen McCants 03/Sep/10 11:07   03/Sep/10 11:09   3.3.0, 3.3.1       0 2   Linux; Java 1.6; x86; I'm starting a standalone Zookeeper server (v3.3.1). That starts normally and does not have a runaway thread.

Next, I start an based Eclipse application that is using ZK 3.3.0 to register itself with the ZooKeeper server (3.3.1). The Eclipse application using the following arguments to Eclipse:

-Dzoodiscovery.autoStart=true
-Dzoodiscovery.flavor=zoodiscovery.flavor.centralized=smccants.austin.ibm.com

When the Eclipse application starts, the ZK server prints out:

2010-09-03 09:59:46,006 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /9.53.189.11:42271
2010-09-03 09:59:46,039 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@776] - Client attempting to establish new session at /9.53.189.11:42271
2010-09-03 09:59:46,045 - INFO [SyncThread:0:NIOServerCnxn@1579] - Established session 0x12ad81b90000002 with negotiated timeout 4000 for client /9.53.189.11:42271
2010-09-03 09:59:46,046 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /9.53.189.11:42272
2010-09-03 09:59:46,078 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@776] - Client attempting to establish new session at /9.53.189.11:42272
2010-09-03 09:59:46,080 - INFO [SyncThread:0:NIOServerCnxn@1579] - Established session 0x12ad81b90000003 with negotiated timeout 4000 for client /9.53.189.11:42272

Then both the Eclipse application and the ZK server go into runaway states and consume 100% of the CPU.

Here is a view from top:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4949 smccants 15 0 597m 78m 5964 S 66.2 1.0 1:03.14 autosubmitter
4876 smccants 17 0 554m 27m 6688 S 30.9 0.3 0:34.74 java

PID 4949 (autosubmitter) is the Eclipse application and is using more than twice the CPU of PID 4876 (java) which is the ZK server. They will continue in this state indefinitely.

I can attach a debugger to the Eclipse application and if I stop the thread named "pool-1-thread-2-SendThread(smccants.austin.ibm.com:2181)" and the runaway condition stops on both the application and ZK server. However the ZK server reports:

2010-09-03 10:03:38,001 - INFO [SessionTracker:ZooKeeperServer@315] - Expiring session 0x12ad81b90000003, timeout of 4000ms exceeded
2010-09-03 10:03:38,002 - INFO [ProcessThread:-1:PrepRequestProcessor@208] - Processed session termination for sessionid: 0x12ad81b90000003
2010-09-03 10:03:38,005 - INFO [SyncThread:0:NIOServerCnxn@1434] - Closed socket connection for client /9.53.189.11:42272 which had sessionid 0x12ad81b90000003

Here is the stack trace from the suspended thread:

EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method]
EPollArrayWrapper.poll(long) line: 215
EPollSelectorImpl.doSelect(long) line: 77
EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69
EPollSelectorImpl(SelectorImpl).select(long) line: 80
ClientCnxn$SendThread.run() line: 1066

Any ideas what might be going wrong?

Thanks.
214196 No Perforce job exists for this issue. 0 32841
9 years, 29 weeks, 6 days ago 0|i05zfz:
ZooKeeper ZOOKEEPER-864

Hedwig C++ client improvements

Improvement Closed Major Fixed Ivan Kelly Ivan Kelly Ivan Kelly 03/Sep/10 10:42   23/Nov/11 14:22 11/Oct/10 15:01   3.4.0     0 1   I changed the socket code to use boost asio. Now the client only creates one thread, and all operations are non-blocking.

Tests are now automated, just run "make check".
47571 No Perforce job exists for this issue. 5 33376
9 years, 24 weeks, 2 days ago
Reviewed
0|i062qv:
ZooKeeper ZOOKEEPER-863

Runaway thread - Zookeeper inside Eclipse

Bug Open Critical Unresolved Unassigned Stephen McCants Stephen McCants 03/Sep/10 10:33   03/Sep/10 14:52   3.3.0       0 2   Linux; x86 I'm running Zookeeper inside an Eclipse application. When I launch the application from inside Eclipse I use the following arguments:

-Dzoodiscovery.autoStart=true
-Dzoodiscovery.flavor=zoodiscovery.flavor.centralized=localhost

This causes the application to start its own ZooKeeper server inside the JVM/application. It immediately goes into a runaway state. The name of the runaway thread is "NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181". When I suspend this thread, the CPU usage returns to 0. Here is a stack trace from that thread when it is suspended:

EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method]
EPollArrayWrapper.poll(long) line: 215
EPollSelectorImpl.doSelect(long) line: 77
EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69
EPollSelectorImpl(SelectorImpl).select(long) line: 80
NIOServerCnxn$Factory.run() line: 232

Any ideas what might be going wrong?

Thanks.
214195 No Perforce job exists for this issue. 1 32842
9 years, 29 weeks, 6 days ago 0|i05zg7:
ZooKeeper ZOOKEEPER-862

Hedwig created ledgers with hardcoded Bookkeeper ensemble and quorum size. Make these a server config parameter instead.

Improvement Closed Major Fixed Erwin Tam Erwin Tam Erwin Tam 02/Sep/10 13:47   23/Nov/11 14:22 05/Nov/10 02:25   3.4.0 contrib-hedwig   0 1   Hedwig code right now when using Bookkeeper as the persistence store is hardcoding the number of bookie servers in the ensemble and quorum size. This is used the first time a ledger is created. This should be exposed as a server configuration parameter instead. 47572 No Perforce job exists for this issue. 1 33377
9 years, 20 weeks, 6 days ago
Reviewed
0|i062r3:
ZooKeeper ZOOKEEPER-861

Missing the test SSL certificate used for running junit tests.

Bug Closed Minor Fixed Erwin Tam Erwin Tam Erwin Tam 02/Sep/10 13:40   23/Nov/11 14:22 07/Sep/10 14:29   3.4.0 contrib-hedwig   0 2   The Hedwig code checked into Apache is missing a test SSL certificate file used for running the server junit tests. We need this file otherwise the tests that use this (e.g. TestHedwigHub) will fail. 47573 No Perforce job exists for this issue. 2 32843
9 years, 28 weeks, 1 day ago
Reviewed
0|i05zgf:
ZooKeeper ZOOKEEPER-860

Add alternative search-provider to ZK site

Improvement Open Minor Unresolved Alex Baranau Alex Baranau Alex Baranau 02/Sep/10 10:05   05/Feb/20 07:16     3.7.0, 3.5.8 documentation   1 3   Use search-hadoop.com service to make available search in ZK sources, MLs, wiki, etc.
This was initially proposed on user mailing list (http://search-hadoop.com/m/sTZ4Y1BVKWg1). The search service was already added in site's skin (common for all Hadoop related projects) before (as a part of [AVRO-626|https://issues.apache.org/jira/browse/AVRO-626]) so this issue is about enabling it for ZK. The ultimate goal is to use it at all Hadoop's sub-projects' sites.
71222 No Perforce job exists for this issue. 1 42117
5 years, 47 weeks, 6 days ago 0|i07kof:
ZooKeeper ZOOKEEPER-859

Native Windows version of C client

New Feature Closed Major Duplicate Ben Collins Ben Collins Ben Collins 31/Aug/10 14:52   23/Nov/11 14:22 13/Jul/11 22:49 3.3.1 3.4.0 c client   0 1   Windows 7, 64-bit Use windows sockets and the win32 API for implementing the c client. This would be only useful for the "single-threaded" model, where the IO waiting is taken care of in the calling code. 68110 No Perforce job exists for this issue. 3 33378
8 years, 37 weeks ago 0|i062rb:
ZooKeeper ZOOKEEPER-858

Zookeeper appears as QuorumPeerMain in jps output, which is not very user-friendly

Improvement Open Major Unresolved Unassigned Jeff Hammerbacher Jeff Hammerbacher 30/Aug/10 22:18   31/Aug/10 06:27           0 2   As noted by Jordan Sissel on Twitter: http://twitter.com/jordansissel/status/22570450969 214194 No Perforce job exists for this issue. 0 42118
9 years, 30 weeks, 2 days ago 0|i07kon:
ZooKeeper ZOOKEEPER-857

clarify client vs. server view of session expiration event

Bug Open Major Unresolved Unassigned qing yan qing yan 30/Aug/10 22:14   05/Feb/20 07:15     3.7.0, 3.5.8 documentation   0 0   Per mailing list discussion:

<quote>

the client only finds out about session expiration events when the client reconnects to the cluster. if zk tells a client that its session is expired, the ephemerals that correspond to that session will already be cleaned up.

- deletion of an ephemeral file due to loss of client connection will occur
after the client gets a connection loss

- deletion of an ephemeral file will precede delivery of a session
expiration event to the owner
</quote>

So session expirations means two things here : server view(ephemeral clean up) & client view(event delivery) , there are
no guarantee how long it will take in between, correct?

I guess the confusion rises from the documention which doesn't distinguish these two concepts, e.g. in the javadoc http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/index.html

An ephemeral node will be removed by the ZooKeeper automatically when the session associated with the creation of the node expires.

It is actually refering to the server view not the client view.
70752 No Perforce job exists for this issue. 0 32844
9 years, 30 weeks, 2 days ago 0|i05zgn:
ZooKeeper ZOOKEEPER-856

Connection imbalance leads to overloaded ZK instances

Bug Open Major Unresolved Mahadev Konar Travis Crawford Travis Crawford 26/Aug/10 15:10   05/Feb/20 07:16     3.7.0, 3.5.8     1 7   We've experienced a number of issues lately where "ruok" requests would take upwards of 10 seconds to return, and ZooKeeper instances were extremely sluggish. The sluggish instance requires a restart to make it responsive again.

I believe the issue is connections are very imbalanced, leading to certain instances having many thousands of connections, while other instances are largely idle.

A potential solution is periodically disconnecting/reconnecting to balance connections over time; this seems fine because sessions should not be affected, and therefore ephemaral nodes and watches should not be affected.
70739 No Perforce job exists for this issue. 2 32845
10 weeks ago 0|i05zgv:
ZooKeeper ZOOKEEPER-855

clientPortBindAddress should be clientPortAddress

Bug Closed Trivial Fixed Jared Cantwell Jared Cantwell Jared Cantwell 26/Aug/10 10:49   23/Nov/11 14:22 18/Oct/10 17:56 3.3.0, 3.3.1 3.3.2, 3.4.0 documentation   0 1   The server documentation states that the configuration parameter for binding to a specific ip address is clientPortBindAddress. The code believes the parameter is clientPortAddress. The documentation for 3.3.X versions needs changed to reflect the correct parameter . This parameter was added in ZOOKEEPER-635. 47574 No Perforce job exists for this issue. 2 32846
9 years, 23 weeks, 2 days ago 0|i05zh3:
ZooKeeper ZOOKEEPER-854

BookKeeper does not compile due to changes in the ZooKeeper code

Bug Closed Major Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 19/Aug/10 16:04   23/Nov/11 14:22 28/Aug/10 12:11 3.3.1 3.4.0 contrib-bookkeeper   0 1   BookKeeper does not compile due to changes in the NIOServerCnxn class of ZooKeeper. 47575 No Perforce job exists for this issue. 2 32847
9 years, 28 weeks, 1 day ago
Reviewed
0|i05zhb:
ZooKeeper ZOOKEEPER-853

Make zookeeper.is_unrecoverable return True or False and not an integer

Improvement Closed Minor Fixed Andrei Savu Andrei Savu Andrei Savu 19/Aug/10 09:07   23/Nov/11 14:22 30/Aug/10 17:13   3.4.0 contrib-bindings   0 0   This is a patch that fixes a TODO from the python zookeeper extension, it makes {{zookeeper.is_unrecoverable}} return {{True}} or {{False}} and not an integer. 47576 No Perforce job exists for this issue. 2 33379
9 years, 28 weeks, 1 day ago zookeeper.is_unrecoverable returns True or False
Reviewed
0|i062rj:
ZooKeeper ZOOKEEPER-852

Check path validation in C client

Task Open Major Unresolved Unassigned Thomas Koch Thomas Koch 17/Aug/10 04:42   05/Feb/20 07:16     3.7.0, 3.5.8 c client   0 0   In ZOOKEEPER-849 we observed, that the validation code and the documentation of allowed characters is out of sync. Surely the validation is to permissive. The issue is fixed for the java client in ZOOKEEPER-849.
As I'm not familiar with the C client code, I fill this separate issue in the hope that somebody may have a look at it.
70749 No Perforce job exists for this issue. 0 42119
9 years, 32 weeks, 2 days ago 0|i07kov:
ZooKeeper ZOOKEEPER-851

ZK lets any node to become an observer

Bug Open Critical Unresolved Unassigned Vishal Kher Vishal Kher 16/Aug/10 10:12   14/Dec/19 06:08   3.3.1 3.7.0 quorum, server   0 6   I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as show below:

tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.0=10.150.27.61:2888:3888
server.1=10.150.27.62:2888:3888
server.2=10.150.27.63:2888:3888

I wanted to add another node to the cluster. In fourth node's zoo.cfg, I created another entry for that node and started zk server. The zoo.cfg on the first 3 nodes was left unchanged. The fourth node was able to join the cluster even though the 3 nodes had no idea about the fourth node.

zoo.cfg on fourth node:
tickTime=2000
dataDir=/var/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.0=10.150.27.61:2888:3888
server.1=10.150.27.62:2888:3888
server.2=10.150.27.63:2888:3888
server.3=10.17.117.71:2888:3888

It looks like 10.17.117.71 is becoming an observer in this case. I was expecting that the leader will reject 10.17.117.71.

# telnet 10.17.117.71 2181
Trying 10.17.117.71...
Connected to 10.17.117.71.
Escape character is '^]'.
stat
Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT
Clients:
/10.17.117.71:37297[1](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 3
Sent: 2
Outstanding: 0
Zxid: 0x200000065
Mode: follower
Node count: 288
61797 No Perforce job exists for this issue. 1 42120
3 years, 38 weeks, 6 days ago 0|i07kp3:
ZooKeeper ZOOKEEPER-850

Switch from log4j to slf4j

Improvement Resolved Major Fixed Olaf Krische Olaf Krische Olaf Krische 16/Aug/10 08:49   24/Feb/20 20:21 23/Jan/12 19:25 3.3.1 3.4.0 java client   10 10   ZOOKEEPER-1010 Hello,

i would like to see slf4j integrated into the zookeeper instead of relying explicitly on log4j.

slf4j is an abstract logging framework. There are adapters from slf4j to many logger implementations, one of them is log4j.

The decision which log engine to use i dont like to make so early.

This would help me to embed zookeeper in my own applications (which use a different logger implemenation, but slf4j is the basis)

What do you think?

(as i can see, those slf4j request flood all other projects on apache as well :-)

Maybe for 3.4 or 4.0?

I can offer a patchset, i have experience in such an migration already. :-)
175 No Perforce job exists for this issue. 6 33380
6 years, 28 weeks, 2 days ago * replaces log4j with slf4j code (also in contrib for bookkeeper, zooinspector,rest,loggraph), added slf4j dependencies into several ivy.xml files
* you must add slf4j-api-1.6.1.jar and slf4j-log4j12-1.6.1.jar (bridge from sl4j to log4j) to the classpath, if not using the standard scripts
* log4j remains as the final logger yet, there is still work to do: remove programmatic access to the log4j api from certain classes (which add appenders or configure log4j at runtime), or move them to contrib

Reviewed
0|i062rr:
ZooKeeper ZOOKEEPER-849

ZOOKEEPER-835 Provide Path class

Sub-task Open Major Unresolved Thomas Koch Thomas Koch Thomas Koch 16/Aug/10 08:28   14/Dec/19 06:08     3.7.0 java client   0 5   39 No Perforce job exists for this issue. 7 42121
6 years, 2 weeks, 6 days ago 0|i07kpb:
ZooKeeper ZOOKEEPER-848

Implement the Failure Detector module in the C client

Improvement Open Major Unresolved Unassigned Abmar Barros Abmar Barros 16/Aug/10 00:11   24/Feb/11 23:24       c client   0 0   The failure detector module https://issues.apache.org/jira/browse/ZOOKEEPER-702 is only used in the java client of ZooKeeper, once it reuses the implementation written in Java. The failure detectors must be written in C and the C client must be refactored to use them. 214193 No Perforce job exists for this issue. 0 42122
9 years, 4 weeks, 6 days ago failure detector C client 0|i07kpj:
ZooKeeper ZOOKEEPER-847

Missing acl check in zookeeper create

Bug Open Major Unresolved Unassigned Patrick Datko Patrick Datko 13/Aug/10 09:01   05/Feb/20 07:17   3.3.1, 3.3.2, 3.3.3 3.7.0, 3.5.8 java client   0 3   I watched the source of the zookeeper class and I missed an acl check in the asynchronous version of the create operation. Is there any reason, that in the asynch version is no
check whether the acl is valid, or did someone forget to implement it. It's interesting because we worked on a refactoring of the zookeeper client and don't want to implement a bug.

The following code is missing:
if (acl != null && acl.size() == 0) {
throw new KeeperException.InvalidACLException();
}
172 No Perforce job exists for this issue. 1 32848
8 years, 26 weeks, 2 days ago acl-check 0|i05zhj:
ZooKeeper ZOOKEEPER-846

zookeeper client doesn't shut down cleanly on the close call

Bug Closed Blocker Fixed Patrick D. Hunt Ted Yu Ted Yu 12/Aug/10 13:19   23/Nov/11 14:22 22/Sep/10 02:39 3.2.2 3.3.2, 3.4.0 java client   0 3   Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver
process was shutting down and seemed to hang.

Here is the bottom of region server log:
http://pastebin.com/YYawJ4jA

zookeeper-3.2.2 is used.

Here is relevant portion from jstack - I attempted to attach jstack twice in my email to dev@hbase.apache.org but failed:

"DestroyJavaVM" prio=10 tid=0x00002aabb849c800 nid=0x6c60 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE

"regionserver/10.32.42.245:60020" prio=10 tid=0x00002aabb84ce000 nid=0x6c81 in Object.wait() [0x0000000043755000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00002aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet)
at java.lang.Object.wait(Object.java:485)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
- locked <0x00002aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
- locked <0x00002aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
at java.lang.Thread.run(Thread.java:619)

"main-EventThread" daemon prio=10 tid=0x0000000043474000 nid=0x6c80 waiting on condition [0x00000000413f3000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00002aaabf6e9150> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)
47577 No Perforce job exists for this issue. 2 32849
9 years, 27 weeks, 1 day ago
Reviewed
0|i05zhr:
ZooKeeper ZOOKEEPER-845

remove duplicate code from netty and nio ServerCnxn classes

Improvement Resolved Major Duplicate Mohammad Arshad Benjamin Reed Benjamin Reed 12/Aug/10 12:53   08/Aug/16 10:30 08/Aug/16 10:30   3.5.1 server   1 3   the code for handling the 4-letter words is duplicated between the nio and netty versions of ServerCnxn. this makes maintenance problematic. 70783 No Perforce job exists for this issue. 0 42123
3 years, 32 weeks, 3 days ago 0|i07kpr:
ZooKeeper ZOOKEEPER-844

handle auth failure in java client

Bug Closed Major Fixed Camille Fournier Camille Fournier Camille Fournier 12/Aug/10 12:05   23/Nov/11 14:22 06/Oct/10 12:19 3.3.1 3.3.2, 3.4.0 java client   0 1   ClientCnxn.java currently has the following code:
if (replyHdr.getXid() == -4) {
// -2 is the xid for AuthPacket
// TODO: process AuthPacket here
if (LOG.isDebugEnabled()) {
LOG.debug("Got auth sessionid:0x"
+ Long.toHexString(sessionId));
}
return;
}

Auth failures appear to cause the server to disconnect but the client never gets a proper state change or notification that auth has failed, which makes handling this scenario very difficult as it causes the client to go into a loop of sending bad auth, getting disconnected, trying to reconnect, sending bad auth again, over and over.
47578 No Perforce job exists for this issue. 2 32850
9 years, 25 weeks ago
Reviewed
0|i05zhz:
ZooKeeper ZOOKEEPER-843

ZOOKEEPER-835 Session class?

Sub-task Open Major Unresolved Unassigned Patrick Datko Patrick Datko 11/Aug/10 11:36   11/Aug/10 11:57   3.3.1   java client   0 1   Maybe it'd make sense to combine hostlist, sessionId, sessionPassword and
sessionTimeout in a Session class so that the ctor of ClientCnxn won't get too
long?
214192 No Perforce job exists for this issue. 0 42124
9 years, 33 weeks, 1 day ago session, session class, refactored class 0|i07kpz:
ZooKeeper ZOOKEEPER-842

ZOOKEEPER-835 stat calls static method on org.apache.zookeeper.server.DataTree

Sub-task Open Major Unresolved Unassigned Patrick Datko Patrick Datko 11/Aug/10 11:34   11/Aug/10 11:35   3.3.1   java client   0 1   It's a huge jump from client code to the internal server class DataTree.
Shouldn't there rather be some class related to the protobuffer stat class
that knows how to copy a stat?
214191 No Perforce job exists for this issue. 0 42125
9 years, 33 weeks, 1 day ago DataTree, protobuffer 0|i07kq7:
ZooKeeper ZOOKEEPER-841

ZOOKEEPER-835 stat is returned by parameter

Sub-task Open Major Unresolved Unassigned Patrick Datko Patrick Datko 11/Aug/10 10:57   11/Aug/10 10:59   3.3.1   java client   0 1   Since one can return only one value in java it's the only choice to do so.
Still it feels more like C then like Java. However with operator classes one
could simply get the result values with getter functions after the execution.
214190 No Perforce job exists for this issue. 0 42126
9 years, 33 weeks, 1 day ago 0|i07kqf:
ZooKeeper ZOOKEEPER-840

ZOOKEEPER-835 massive code duplication in zookeeper class

Sub-task Open Major Unresolved Thomas Koch Patrick Datko Patrick Datko 11/Aug/10 10:56   16/Aug/10 08:20           0 1   Each operation calls validatePath, handles the chroot, calls ClientCnxn and
checks the return header for error. I'd like to address this with the
operation classes:
Each operation should receive a prechecked Path object. Calling ClientCnxn and
error checking is not (or only partly) the concern of the operation but of an
"executor" like class.
214189 No Perforce job exists for this issue. 0 42127
9 years, 33 weeks, 1 day ago code duplication 0|i07kqn:
ZooKeeper ZOOKEEPER-839

ZOOKEEPER-835 deleteRecursive does not belong to the other methods

Sub-task Closed Blocker Fixed Mahadev Konar Patrick Datko Patrick Datko 11/Aug/10 10:53   23/Nov/11 14:22 14/Aug/11 12:41 3.3.1 3.4.0 java client   0 0   DeleteRecursive has been committed to trunk already as a method to the
zookeeper class. So in the API it has the same level as the atomic operations
create, delete, getData, setData, etc. The user must get the false impression,
that deleteRecursive is also an atomic operation.
It would be better to have deleteRecursive in some helper class but not that
deep in zookeeper's core code. Maybe I'd like to have another policy on how to
react if deleteRecursive fails in the middle of its work?
47579 No Perforce job exists for this issue. 1 33381
8 years, 32 weeks, 3 days ago
Reviewed
atomic operations 0|i062rz:
ZooKeeper ZOOKEEPER-838

ZOOKEEPER-835 Chroot is an attribute of ClientCnxn

Sub-task Open Major Unresolved Unassigned Patrick Datko Patrick Datko 11/Aug/10 10:50   21/Dec/10 15:59           0 1   It would be better to have one process that uses ZooKeeper for different things
(managing a list of work, locking some unrelated locks elsewhere). So there are
components that do this work inside the same process. These components should
get the same zookeeper-client reference chroot'ed for their needs.
So it'd be much better, if the ClientCnxn would not care about the chroot.
214188 No Perforce job exists for this issue. 0 42128
9 years, 33 weeks ago chroot 0|i07kqv:
ZooKeeper ZOOKEEPER-837

ZOOKEEPER-835 cyclic dependency ClientCnxn, ZooKeeper

Sub-task Open Major Unresolved Thomas Koch Patrick Datko Patrick Datko 11/Aug/10 10:47   05/Feb/20 07:16   3.3.1 3.7.0, 3.5.8 java client   0 2 0 20400   ZooKeeper instantiates ClientCnxn in its ctor with this and therefor builds a
cyclic dependency graph between both objects. This means, you can't have the
one without the other. So why did you bother do make them to separate classes
in the first place?
ClientCnxn accesses ZooKeeper.state. State should rather be a property of
ClientCnxn. And ClientCnxn accesses zooKeeper.get???Watches() in its method
primeConnection(). I've not yet checked, how this dependency should be
resolved better.
100% 100% 20400 0 pull-request-available 60545 No Perforce job exists for this issue. 5 42129
8 years, 35 weeks, 1 day ago cyclic dependency 0|i07kr3:
ZooKeeper ZOOKEEPER-836

ZOOKEEPER-835 hostlist as string

Sub-task Resolved Major Fixed Thomas Koch Patrick Datko Patrick Datko 11/Aug/10 10:46   01/Dec/10 05:52 30/Nov/10 15:47 3.3.1   java client   0 1   The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of
not doing (too much) work in a ctor. Instead the ClientCnxn should receive an
object of class "HostSet". HostSet could then be instantiated e.g. with a
comma separated string.
47580 No Perforce job exists for this issue. 5 33382
9 years, 17 weeks, 1 day ago
Reviewed
hostliste, comma seperated 0|i062s7:
ZooKeeper ZOOKEEPER-835

Refactoring Zookeeper Client Code

Improvement Open Major Unresolved Thomas Koch Patrick Datko Patrick Datko 11/Aug/10 10:41   27/Dec/10 23:53   3.3.1   java client   0 3   ZOOKEEPER-836, ZOOKEEPER-837, ZOOKEEPER-838, ZOOKEEPER-839, ZOOKEEPER-840, ZOOKEEPER-841, ZOOKEEPER-842, ZOOKEEPER-843, ZOOKEEPER-849, ZOOKEEPER-868, ZOOKEEPER-878, ZOOKEEPER-879, ZOOKEEPER-894, ZOOKEEPER-908, ZOOKEEPER-910, ZOOKEEPER-969, ZOOKEEPER-970 Thomas Koch asked me to fill individual issues for the points raised in his mail to zookeeper-dev:
[Mail of Thomas Koch| http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3C201008111145.17507.thomas@koch.ro%3E ]


He published several issues, which are present in the current zookeeper client, so a refactoring of the code would be an facility for other developers working with zookeeper.
100% 20400 0 214187 No Perforce job exists for this issue. 0 42130
9 years, 23 weeks ago Zookeeper client code, refactoring, improvement, client, code 0|i07krb:
ZooKeeper ZOOKEEPER-834

Allow ephemeral znodes to have children created only by the owner session.

New Feature Resolved Major Duplicate Rakesh Radhakrishnan Andrei Savu Andrei Savu 06/Aug/10 15:58   20/Mar/19 11:28 20/Mar/19 11:28     c client, java client, server   3 12   Ephemeral znodes are automatically removed when the client session is closed or expires and this behavior makes them very useful when you want to publish status information from active / connected clients.

But there is a catch. Right now ephemerals can't have children znodes and because of that clients need to serialize status information as byte strings. This serialization renders that information almost invisible to generic zookeeper clients and hard / inefficient to update.

Most of the time the status information can be expressed as a bunch of (key, value) pairs and we could easily store that using child znodes. Any ZooKeeper client can read that info without the need to reverse the serialization process and we can also easily update it.

I suggest that the server should allow the ephemeral znodes to have children znodes. Each child should also be an ephemeral znode owned by the same session - parent ephemeralOwner session.

Mail Archive:
http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg09819.html

Another discussion about the same topic:
http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08165.html
container_znode_type 40 No Perforce job exists for this issue. 4 2584
1 year, 1 day ago 0|i00spr:
ZooKeeper ZOOKEEPER-833

Attachments in the wiki do not work (so no presentations)

Bug Resolved Major Fixed Thomas Koch Bruce Mitchener Bruce Mitchener 06/Aug/10 15:17   05/Sep/11 15:55 05/Sep/11 15:55     documentation   0 1   This is apparently a known issue:

http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201005.mbox/%3C562709E0-0516-481F-87AD-2039A564E5BD@yahoo-inc.com%3E

None of the attachments on the Presentations page in the wiki work (nor does the link to the screenshot on the performance page).
47581 No Perforce job exists for this issue. 0 32851
8 years, 29 weeks, 3 days ago 0|i05zi7:
ZooKeeper ZOOKEEPER-832

Invalid session id causes infinite loop during automatic reconnect

Bug Patch Available Critical Unresolved Mohammad Arshad Ryan Holmes Ryan Holmes 05/Aug/10 15:16   02/Jul/19 21:49   3.4.5, 3.5.0, 3.4.11   server   13 41   All Steps to reproduce:

1.) Connect to a standalone server using the Java client.
2.) Stop the server.
3.) Delete the contents of the data directory (i.e. the persisted session data).
4.) Start the server.

The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover.

The suggested improvement is to send an event to the default watcher indicating that the current state is "session invalid", similar to how the "session expired" state is handled.

Server log output (repeats indefinitely):
2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292
2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server
2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client)


Client log output (repeats indefinitely):
11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181
11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129)
63862 No Perforce job exists for this issue. 10 42131
37 weeks, 1 day ago 0|i07krj:
ZooKeeper ZOOKEEPER-831

BookKeeper: Throttling improved for reads

Bug Closed Major Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 04/Aug/10 17:25   01/May/13 22:29 17/Sep/10 12:59 3.3.1 3.4.0 contrib-bookkeeper   0 2   Reads and writes in BookKeeper are asymmetric: a write request writes one entry, whereas a read request may read multiple requests. The current implementation of throttling only counts the number of read requests instead of counting the number of entries being read. Consequently, a few read requests reading a large number of entries each will spawn a large number of read-entry requests. 47582 No Perforce job exists for this issue. 4 32852
9 years, 27 weeks, 5 days ago 0|i05zif:
ZooKeeper ZOOKEEPER-830

ZOOKEEPER-704 forrest docs for read-only mode

Sub-task Open Major Unresolved Sergey Doroshenko Sergey Doroshenko Sergey Doroshenko 02/Aug/10 15:36   05/Feb/16 12:38           0 1   214186 No Perforce job exists for this issue. 2 42132
4 years, 6 weeks, 6 days ago 0|i07krr:
ZooKeeper ZOOKEEPER-829

Add /zookeeper/sessions/* to allow inspection/manipulation of client sessions

New Feature Open Major Unresolved Marshall McMullen Todd Lipcon Todd Lipcon 29/Jul/10 13:25   13/Dec/12 03:09       server   1 12   For some use cases in HBase (HBASE-1316 in particular) we'd like the ability to forcible expire someone else's ZK session. Patrick and I discussed on IRC and came up with an idea of creating nodes in /zookeeper/sessions/<session id> that can be read in order to get basic stats about a session, and written in order to manipulate one. The manipulation we need in HBase is the ability to write a command like "kill", but others might be useful as well. 214185 No Perforce job exists for this issue. 2 42133
8 years, 4 weeks, 2 days ago 0|i07krz:
ZooKeeper ZOOKEEPER-827

ZOOKEEPER-704 enable r/o mode in C client library

Sub-task Resolved Major Fixed Raúl Gutiérrez Segalés Sergey Doroshenko Sergey Doroshenko 21/Jul/10 14:06   02/May/15 16:34 07/Jul/14 17:44   3.5.0     0 6   Implement read-only mode functionality (in accordance with http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode) in C client library 214184 No Perforce job exists for this issue. 10 42134
4 years, 46 weeks, 5 days ago 0|i07ks7:
ZooKeeper ZOOKEEPER-826

cli.c should not call zoo_add_auth immediately after zookeeper_init()

Bug Open Minor Unresolved Unassigned Michi Mutsuzaki Michi Mutsuzaki 21/Jul/10 03:13   21/Jul/10 05:06   3.3.1   c client   0 0   In cli.c, zoo_add_auth() gets called right after zookeeper_init(). Instead, zoo_add_auth() should be called in the callback after the connection is established.

--Michi
214183 No Perforce job exists for this issue. 0 32853
9 years, 36 weeks, 1 day ago 0|i05zin:
ZooKeeper ZOOKEEPER-823

update ZooKeeper java client to optionally use Netty for connections

New Feature Open Major Unresolved Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 19/Jul/10 17:57   05/Feb/20 07:17     3.7.0, 3.5.8 java client   2 10   ZOOKEEPER-909 This jira will port the client side connection code to use netty rather than direct nio. 63346 No Perforce job exists for this issue. 18 42135
5 years, 29 weeks, 2 days ago 0|i07ksf:
ZooKeeper ZOOKEEPER-822

Leader election taking a long time to complete

Bug Closed Blocker Fixed Vishal Kher Vishal Kher Vishal Kher 19/Jul/10 11:51   23/Nov/11 14:22 06/Oct/10 13:03 3.3.0 3.3.2, 3.4.0 quorum   0 4   Created a 3 node cluster.

1 Fail the ZK leader
2. Let leader election finish. Restart the leader and let it join the
3. Repeat

After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created.

zoo.cfg is shown below:

#Mon Jul 19 12:15:10 UTC 2010
server.1=192.168.4.12\:2888\:3888
server.0=192.168.4.11\:2888\:3888
clientPort=2181
dataDir=/var/zookeeper
syncLimit=2
server.2=192.168.4.13\:2888\:3888
initLimit=5
tickTime=2000

I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter.
Look for "START HERE". Logs after that point should be of our interest.
47583 No Perforce job exists for this issue. 17 32854
9 years, 25 weeks ago
Reviewed
0|i05ziv:
ZooKeeper ZOOKEEPER-821

Add ZooKeeper version information to zkpython

Improvement Closed Trivial Fixed Rich Schumacher Rich Schumacher Rich Schumacher 16/Jul/10 17:30   23/Nov/11 14:22 26/Jul/10 17:50 3.3.1 3.4.0 contrib-bindings   0 0   Since installing and using ZooKeeper I've built and installed no less than four versions of the zkpython bindings. It would be really helpful if the module had a '__version__' attribute to easily tell which version is currently in use. 47584 No Perforce job exists for this issue. 1 33383
9 years, 35 weeks, 2 days ago Add a version number to zkpython releases.
Reviewed
0|i062sf:
ZooKeeper ZOOKEEPER-820

update c unit tests to ensure "zombie" java server processes don't cause failure

Bug Closed Critical Fixed Michi Mutsuzaki Patrick D. Hunt Patrick D. Hunt 16/Jul/10 12:00   23/Nov/11 14:22 20/Oct/10 14:46 3.3.1 3.3.2, 3.4.0     0 1   When the c unit tests are run sometimes the server doesn't shutdown at the end of the test, this causes subsequent tests (hudson esp) to fail.

1) we should try harder to make the server shut down at the end of the test, I suspect this is related to test failing/cleanup
2) before the tests are run we should see if the old server is still running and try to shut it down
47585 No Perforce job exists for this issue. 4 32855
9 years, 23 weeks, 1 day ago
Reviewed
0|i05zj3:
ZooKeeper ZOOKEEPER-819

ZOOKEEPER-816 build the checking tool

Sub-task Open Minor Unresolved Unassigned Miguel Correia Miguel Correia 16/Jul/10 11:03   16/Jul/10 11:03           0 0   Building the checking tool is the hardest part of the project. It involves putting the traces together in a unified trace and checking if this unified trace shows that Zookeeper is satisfying a set of properties (e.g., a getData returns what was stored by the previous setData or create). 214182 No Perforce job exists for this issue. 0 42136
9 years, 36 weeks, 6 days ago 0|i07ksn:
ZooKeeper ZOOKEEPER-818

ZOOKEEPER-816 improve the traces with additional information needed

Sub-task Open Minor Unresolved Unassigned Miguel Correia Miguel Correia 16/Jul/10 11:01   16/Jul/10 11:01           0 0   The current traces do not include all the information we need to do the checking. The main additions would be to log the replies and hashes of values read/written. 214181 No Perforce job exists for this issue. 0 42137
9 years, 36 weeks, 6 days ago 0|i07ksv:
ZooKeeper ZOOKEEPER-817

ZOOKEEPER-816 improve the efficiency of tracing

Sub-task Open Minor Unresolved Unassigned Miguel Correia Miguel Correia 16/Jul/10 11:00   16/Jul/10 11:00           0 0   Zookeeper uses two kinds of logs, logs for information and debugging (the ones considered in this project) and transaction logs (need for Zab/Paxos to be fault tolerant); the latter are very efficient so the idea would be to make the first likewise using similar mechanisms. 214180 No Perforce job exists for this issue. 0 42138
9 years, 36 weeks, 6 days ago 0|i07kt3:
ZooKeeper ZOOKEEPER-816

Detecting and diagnosing elusive bugs and faults in Zookeeper

New Feature Open Minor Unresolved Unassigned Miguel Correia Miguel Correia 16/Jul/10 10:55   20/Jul/10 16:48           0 1   ZOOKEEPER-817, ZOOKEEPER-818, ZOOKEEPER-819 Complex distributed systems like Zookeeper tend to fail in strange ways that are hard to diagnose. The objective is to build a tool that helps understand when and where these problems occurred based on Zookeeper's traces (i.e., logs in TRACE level). Minor changes to the server code will be needed. 214179 No Perforce job exists for this issue. 0 42139
9 years, 36 weeks, 2 days ago 0|i07ktb:
ZooKeeper ZOOKEEPER-815

fill in "TBD"s in overview doc

Bug Open Minor Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 15/Jul/10 16:14   14/Dec/19 06:09   3.3.1 3.7.0 documentation   1 2   Funny: "Ephemeral nodes are useful when you want to implement [tbd]." there are a few others in that doc that are should really be fixed.
documentation 70791 No Perforce job exists for this issue. 0 32856
9 years, 22 weeks ago 0|i05zjb:
ZooKeeper ZOOKEEPER-814

monitoring scripts are missing apache license headers

Bug Closed Blocker Fixed Andrei Savu Patrick D. Hunt Patrick D. Hunt 14/Jul/10 02:59   23/Nov/11 14:22 26/Jul/10 18:01   3.4.0 contrib   0 1   Andrei, I just realized that src/contrib/monitoring files are missing apache license headers. Please add them (in particular any script files like python, see similar files in svn for examples - in some cases like README it's not strictly necessary.)

You can run the RAT tool to verify (see build.xml or http://incubator.apache.org/rat/)
47586 No Perforce job exists for this issue. 1 32857
9 years, 35 weeks, 2 days ago
Reviewed
0|i05zjj:
ZooKeeper ZOOKEEPER-813

maven install is broken due to incorrect organisation

Bug Closed Critical Duplicate Jeff Hodges Jeff Hodges Jeff Hodges 12/Jul/10 03:14   23/Nov/11 14:22 12/Jul/10 18:49 3.3.1 3.3.2, 3.4.0 build   0 0   SBT doesn't like the pom file for zookeeper because while it's under the "org.apache.hadoop" directory, it's organisation is actually "org.apache.zookeeper". A simple fix for this is to just change "org.apache.zookeeper" to "org.apache.hadoop". 214178 No Perforce job exists for this issue. 0 32858
9 years, 37 weeks, 3 days ago 0|i05zjr:
ZooKeeper ZOOKEEPER-812

ZOOKEEPER-702 Failure Detector Model: Evaluate QoS metrics

Sub-task Open Major Unresolved Abmar Barros Abmar Barros Abmar Barros 12/Jul/10 03:03   12/Jul/10 03:03           0 0   214177 No Perforce job exists for this issue. 0 42140
9 years, 37 weeks, 3 days ago 0|i07ktj:
ZooKeeper ZOOKEEPER-811

ZOOKEEPER-702 Failure Detector Model: Refactor server to server monitoring

Sub-task Open Major Unresolved Abmar Barros Abmar Barros Abmar Barros 12/Jul/10 02:51   29/Jul/10 14:26           0 0   Refactor server to server failure detection code to use the FailureDetector interface proposed in the parent JIRA. The failure detection method and its parameters should also be configurable in this case.

Patches submitted in this JIRA use the latest patch of the parent JIRA as baseline.
214176 No Perforce job exists for this issue. 1 42141
9 years, 35 weeks ago 0|i07ktr:
ZooKeeper ZOOKEEPER-810

ZOOKEEPER-702 Failure Detector Model: Write Forrest docs

Sub-task Open Major Unresolved Abmar Barros Abmar Barros Abmar Barros 12/Jul/10 02:44   12/Jul/10 02:44           0 0   Write forrest docs about the Failure Detector Model implementation.

This documentation should help one to understand how the failure detection model works on ZooKeeper, both on client and server sides. The usage and configuration of this feature should also be addressed in this documentation.
214175 No Perforce job exists for this issue. 0 42142
9 years, 37 weeks, 3 days ago failure detector forrest doc 0|i07ktz:
ZooKeeper ZOOKEEPER-809

Improved REST Interface

Improvement Closed Major Fixed Andrei Savu Andrei Savu Andrei Savu 10/Jul/10 11:18   23/Nov/11 14:22 17/Aug/10 03:26   3.4.0 contrib   0 0   I would like to extend the existing REST Interface to also support:
* configuration
* ephemeral znodes
* watches - PubSubHubbub
* ACLs
* basic authentication

I want to do this because when building web applications that talks directly to ZooKeeper a REST API it's a lot easier to use (there is no protocol mismatch) than an API that uses persistent connections. I plan to use the improved version to build a web-based administrative interface.
47587 No Perforce job exists for this issue. 9 33384
9 years, 32 weeks, 2 days ago
Reviewed
0|i062sn:
ZooKeeper ZOOKEEPER-808

Web-based Administrative Interface

New Feature Closed Major Fixed Andrei Savu Andrei Savu Andrei Savu 10/Jul/10 10:38   23/Nov/11 14:22 18/Aug/10 01:53   3.4.0 contrib   0 0   Implement a web-based administrative interface that should allow the user to perform all the tasks that can be done using the interactive shell (zkCli.sh) from a browser. It should also display cluster and individual server info extracted using the 4letter word commands.

I'm going to build starting from the http://github.com/phunt/zookeeper_dashboard implemented by Patrick Hunt.
47588 No Perforce job exists for this issue. 1 33385
9 years, 28 weeks, 1 day ago
Reviewed
web, interface, contrib 0|i062sv:
ZooKeeper ZOOKEEPER-806

Cluster management with Zookeeper - Norbert

New Feature Resolved Major Later Unassigned John Wang John Wang 07/Jul/10 10:44   22/Feb/13 00:48 22/Feb/13 00:48         0 1   Hello, we have built a cluster management layer on top of Zookeeper here at the SNA team at LinkedIn:

http://sna-projects.com/norbert/

We were wondering ways for collaboration as this is a very useful application of zookeeper.
214174 No Perforce job exists for this issue. 0 42143
9 years, 37 weeks, 5 days ago 0|i07ku7:
ZooKeeper ZOOKEEPER-805

four letter words fail with latest ubuntu nc.openbsd

Bug Resolved Critical Fixed Unassigned Patrick D. Hunt Patrick D. Hunt 06/Jul/10 19:48   30/Apr/14 16:19 30/Apr/14 16:19 3.3.1, 3.4.0 3.4.6 documentation, server   0 3   In both 3.3 branch and trunk "echo stat|nc localhost 2181" fails against the ZK server on Ubuntu Lucid Lynx.

I noticed this after upgrading to lucid lynx - which is now shipping openbsd nc as the default:

OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2)

vs nc traditional

[v1.10-38]

which works fine. Not sure if this is a bug in us or nc.openbsd, but it's currently not working for me. Ugh.
71226 No Perforce job exists for this issue. 0 32859
5 years, 47 weeks, 1 day ago 0|i05zjz:
ZooKeeper ZOOKEEPER-804

c unit tests failing due to "assertion cptr failed"

Bug Closed Critical Fixed Michi Mutsuzaki Patrick D. Hunt Patrick D. Hunt 05/Jul/10 16:35   23/Nov/11 14:22 20/Oct/10 12:27 3.4.0 3.3.2, 3.4.0 c client   0 1   gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel) I'm seeing this frequently:

[exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK
[exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK
[exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK
[exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 25687 : OK
[exec] zktest-mt: /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: zookeeper_process: Assertion `cptr' failed.
[exec] make: *** [run-check] Aborted
[exec] Zookeeper_simpleSystem::testHangingClient

Mahadev can you take a look?
47589 No Perforce job exists for this issue. 3 32860
9 years, 23 weeks, 1 day ago
Reviewed
0|i05zk7:
ZooKeeper ZOOKEEPER-803

Improve defenses against misbehaving clients

Bug Open Major Unresolved Unassigned Travis Crawford Travis Crawford 02/Jul/10 16:52   02/Jul/10 23:03   3.3.0       0 2   This issue is in response to ZOOKEEPER-801. Short version is a small number of buggy clients opened thousands of connections and caused Zookeeper to fail.

The misbehaving client did not correctly handle expired sessions, creating a new connection each time. The huge number of connections exacerbated the issue.
214173 No Perforce job exists for this issue. 1 32861
9 years, 38 weeks, 6 days ago 0|i05zkf:
ZooKeeper ZOOKEEPER-802

Improved LogGraph filters + documentation

Improvement Open Minor Unresolved Ivan Kelly Ivan Kelly Ivan Kelly 02/Jul/10 11:19   05/Feb/20 07:16   3.4.0 3.7.0, 3.5.8     0 0   The log filtering mechanism has been improved and extended to work with message logs. Also, the documentation has been moved into the forrest documentation. 70751 No Perforce job exists for this issue. 6 42144
6 years, 24 weeks ago 0|i07kuf:
ZooKeeper ZOOKEEPER-801

Zookeeper outage post-mortem

Improvement Resolved Major Not A Problem Travis Crawford Travis Crawford Travis Crawford 01/Jul/10 12:39   02/Jul/10 23:03 02/Jul/10 14:31 3.3.0       0 3   - RHEL5 2.6.18 kernel
- Zookeeper 3.3.0
- ulimit raised to 65k files
- 3 cluster members
- 4-5k connections in steady-state
- Primarily C and python clients, plus some java
[Moving a thread from the zookeeper-user]

RECOVERY
We eventually recovered from this situation by shutting down clients. Initially I tried restarting the Zookeepers, however, they were getting hammered and I believe sessions timing out. I shut down ~2k clients (lightweight python app; simply sets one data watch and takes an action when it changes) at which point zktop could make a connection and a leader election was verified. After resetting latency stats the numbers were very good. I do not believe it would have ever recovered without shedding load.


QUORUM/ELECTIONS DURING EVENT
Unfortunately I do not have logs from the event :( We had debug logging on, and logrotate configured to keep 10 100MB files, and the interesting parts rotated away. I have already switched to info logging so we don't lose this data again.

During the incident I was not able to view cluster status with zktop, and never observed a successful operation beyond connections, which quickly timed out.


GC PAUSE/LOGGING
This is a very good question. No, Zookeeper GC is not tuned and uses whatever the defaults are in the start scripts. GC logging is not enabled either. I filed an internal bug against myself to enable logging, and tune GC.


CLIENT SESSION TIMEOUTS
Clients are not explicitly setting timeouts, and I believe sessionTimeout is 10 seconds based on this log entry when initially connecting.

2010-07-01 05:14:00,260:44267(0x2af330240110):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=10.209.21.133:2181,10.209.21.175:2181,10.209.21.181:2181 sessionTimeout=10000 watcher=(nil) sessionId=0 sessionPasswd=<null> context=(nil) flags=0


CLIENT BACKOFFS
Looking in application logs, we see lots of the following:

2010-07-01 05:13:14,674:41491(0x41ebf940):ZOO_ERROR@handle_socket_error_msg@1528: Socket [10.209.21.181:2181] zk retcode=-7, errno=110(Connection timed out): connection timed out (exceeded timeout by 0ms)

Doing some simple aggregations we see 130 errors in a ~13 minute sample period. This behavior on thousands of clients appears to have been a DDoS attack against Zookeeper. Using exponential behavior as the default behavior seems appropriate looking at this data.

Fulltext of the client errors is attached. I grepped "errno" from a Python client log; I believe it uses the same underlying C library, so I did not include example output from a C program (though I can if needed). It looks basically the same.


GOING FORWARD
The long-GC pause causing clients to dogpile sounds like the most plausible explanation at this time. GC logging/tuning is clearly where I dropped the ball, just using the defaults; I don't think any changes should be made related to lack of tuning.

Exponential backoffs does seem like a good idea, and generally useful for most people. There will always be service interruptions and backoffs would be a great preventive measure to get out of a dogpile situation.



Patrick's message:
"""
Hi Travis, as Flavio suggested would be great to get the logs. A few questions:

1) how did you eventually recover, restart the zk servers?

2) was the cluster losing quorum during this time? leader re-election?

3) Any chance this could have been initially triggered by a long GC pause on one of the servers? (is gc logging turned on, any sort of heap monitoring?) Has the GC been tuned on the servers, for example CMS and incremental?

4) what are the clients using for timeout on the sessions?

3.4 probably not for a few months yet, but we are planning for a 3.3.2 in a few weeks to fix a couple critical issues (which don't seem related to what you saw). If we can identify the problem here we should be able to include it in any fix release we do.

fixing something like 517 might help, but it's not clear how we got to this state in the first place. fixing 517 might not have any effect if the root cause is not addressed. 662 has only ever been reported once afaik, and we weren't able to identify the root cause for that one.

One thing we might also consider is modifying the zk client lib to backoff connection attempts if they keep failing (timing out say). Today the clients are pretty aggressive on reconnection attempts. Having some sort of backoff (exponential?) would provide more breathing room to the server in this situation.

Patrick
"""

Flavio's message:
"""
Hi Travis, Do you think it would be possible for you to open a jira and upload your logs?

Thanks,
-Flavio
"""

My initial message:
"""
Hey zookeepers -

We just experienced a total zookeeper outage, and here's a quick
post-mortem of the issue, and some questions about preventing it going
forward. Quick overview of the setup:

- RHEL5 2.6.18 kernel
- Zookeeper 3.3.0
- ulimit raised to 65k files
- 3 cluster members
- 4-5k connections in steady-state
- Primarily C and python clients, plus some java

In chronological order, the issue manifested itself as alert about RW
tests failing. Logs were full of too many files errors, and the output
of netstat showed lots of CLOSE_WAIT and SYN_RECV sockets. CPU was
100%. Application logs showed lots of connection timeouts. This
suggests an event happened that caused applications to dogpile on
Zookeeper, and eventually the CLOSE_WAIT timeout caused file handles
to run out and basically game over.

I looked through lots of logs (clients+servers) and did not see a
clear indication of what happened. Graphs show a sudden decrease in
network traffic when the outage began, zookeeper goes cpu bound, and
runs our of file descriptors.

Clients are primarily a couple thousand C clients using default
connection parameters, and a couple thousand python clients using
default connection parameters.

Digging through Jira we see two issues that probably contributed to this outage:

https://issues.apache.org/jira/browse/ZOOKEEPER-662
https://issues.apache.org/jira/browse/ZOOKEEPER-517

Both are tagged for the 3.4.0 release. Anyone know if that's still the
case, and when 3.4.0 is roughly scheduled to ship?

Thanks!
Travis
"""
214172 No Perforce job exists for this issue. 2 33386
9 years, 38 weeks, 6 days ago zookeeper outage postmortem 0|i062t3:
ZooKeeper ZOOKEEPER-800

zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE

Bug Closed Minor Fixed Michi Mutsuzaki Michi Mutsuzaki Michi Mutsuzaki 29/Jun/10 19:26   23/Nov/11 14:22 21/Oct/10 18:52 3.3.1 3.3.2, 3.4.0 c client   0 4   This happened when I called zoo_add_auth() immediately after zookeeper_init(). It took me a while to figure out that authentication actually failed since zoo_add_auth() returned ZOK. It should return ZINVALIDSTATE instead.

--Michi
47590 No Perforce job exists for this issue. 1 32862
8 years, 40 weeks ago
Reviewed
0|i05zkn:
ZooKeeper ZOOKEEPER-799

Add tools and recipes for monitoring as a contrib

New Feature Closed Major Fixed Andrei Savu Andrei Savu Andrei Savu 29/Jun/10 17:13   17/Sep/12 09:21 14/Jul/10 02:41   3.4.0 contrib   0 2   Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. 47591 No Perforce job exists for this issue. 2 33387
7 years, 27 weeks, 3 days ago Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia.
Reviewed
monitoring, cacti, nagios, ganglia, contrib 0|i062tb:
ZooKeeper ZOOKEEPER-798

ZOOKEEPER-789 Fixup loggraph for FLE changes

Sub-task Closed Minor Fixed Ivan Kelly Ivan Kelly Ivan Kelly 29/Jun/10 09:01   23/Nov/11 14:22 05/Jul/10 15:59   3.4.0 contrib   0 0   47592 No Perforce job exists for this issue. 1 33388
9 years, 38 weeks, 2 days ago
Reviewed
0|i062tj:
ZooKeeper ZOOKEEPER-797

c client source with AI_ADDRCONFIG cannot be compiled with early glibc

Improvement Closed Major Fixed Qian Ye Qian Ye Qian Ye 29/Jun/10 00:06   23/Nov/11 14:22 05/Jul/10 16:33 3.3.1 3.4.0 c client   0 0   linux 2.6.9 c client source with AI_ADDRCONFIG cannot be compiled with early glibc (before 2.3.3) 47593 No Perforce job exists for this issue. 1 33389
9 years, 38 weeks, 2 days ago
Reviewed
c client 0|i062tr:
ZooKeeper ZOOKEEPER-796

zkServer.sh should support an external PIDFILE variable

Bug Closed Major Fixed Alex Newman Alex Newman Alex Newman 28/Jun/10 18:02   23/Nov/11 14:22 06/Jul/10 17:51   3.4.0 scripts   0 1   So currently the pid file has to be tied to the datadirectory when starting zkServer.sh. It would be good to be able to break them up. 47594 No Perforce job exists for this issue. 2 32863
9 years, 38 weeks, 1 day ago
Reviewed
0|i05zkv:
ZooKeeper ZOOKEEPER-795

eventThread isn't shutdown after a connection "session expired" event coming

Bug Closed Blocker Fixed Sergey Doroshenko mathieu barcikowski mathieu barcikowski 28/Jun/10 06:12   13/Jun/13 14:28 17/Aug/10 16:05 3.3.1 3.3.2, 3.4.0 java client   0 4   ubuntu 10.04 Hi,

I notice a problem with the eventThread located in ClientCnxn.java file.
The eventThread isn't shutdown after a connection "session expired" event coming (i.e. never receive EventOfDeath).

When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread.
As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything).

So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread.

How to reproduce :
- Start a zookeeper client connection in debug mode
- Pause the jvm enough time to the expired event occur
- Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time
- if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state




47595 No Perforce job exists for this issue. 3 32864
9 years, 32 weeks, 2 days ago 0|i05zl3:
ZooKeeper ZOOKEEPER-794

Callbacks are not invoked when the client is closed

Bug Closed Blocker Fixed Alexis Midon Alexis Midon Alexis Midon 25/Jun/10 22:47   23/Nov/11 14:22 20/Oct/10 20:47 3.3.1 3.3.2, 3.4.0 java client   0 4   I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client.
Actually a synchronous call will throw a "session expired" exception while an asynchronous call will do nothing. No exception, no callback invocation.

Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked.

47596 No Perforce job exists for this issue. 7 32865
9 years, 23 weeks ago
Reviewed
0|i05zlb:
ZooKeeper ZOOKEEPER-793

ZOOKEEPER-775 Large-scale Pub/Sub System (C++ Client)

Sub-task Resolved Major Fixed Ivan Kelly Ivan Kelly Ivan Kelly 25/Jun/10 10:39   05/May/11 07:59 05/May/11 07:59         0 1   Write a c++ client for hedwig 47597 No Perforce job exists for this issue. 1 33390
8 years, 47 weeks ago 0|i062tz:
ZooKeeper ZOOKEEPER-792

zkpython memory leak

Bug Closed Major Fixed Lei Zhang Lei Zhang Lei Zhang 24/Jun/10 17:36   23/Nov/11 14:22 22/Aug/10 22:59 3.3.1 3.3.2, 3.4.0 contrib-bindings   0 1   vmware workstation - guest OS:Linux python:2.4.3 We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less client deadlock on session expiration, which is a definite plus!

Unfortunately we are seeing memory leak that requires our zk clients to be restarted every half-day. Valgrind result:

==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in loss record 255 of 670
==8804== at 0x4021C42: calloc (vg_replace_malloc.c:418)
==8804== by 0x5047B42: parse_acls (zookeeper.c:369)
==8804== by 0x5047EF6: pyzoo_create (zookeeper.c:1009)
==8804== by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0)
==8804== by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0)
==8804== by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0)
47598 No Perforce job exists for this issue. 3 32866
9 years, 28 weeks, 1 day ago
Reviewed
0|i05zlj:
ZooKeeper ZOOKEEPER-791

Watches get triggered during client's reconnection

Bug Open Minor Unresolved Unassigned Sergey Doroshenko Sergey Doroshenko 24/Jun/10 08:20   24/Jun/10 08:22           0 1   I start 2 of 3 servers of an ensemble, connect to it with zkCli.sh, do "ls / 1" which registers a watch.
Then I kill one of 2 servers which makes alive one to lose a quorum and forces client to reconnect.

And when the client connects to this alive server (but gets quickly dropped by the server afterwards), watch is triggered:
WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/

I can reproduce it only with command-line client, and quite rarely. I tried to write unit test, but id didn't catch this.
Has anybody seen this before?
214171 No Perforce job exists for this issue. 1 32867
9 years, 40 weeks ago 0|i05zlr:
ZooKeeper ZOOKEEPER-790

Last processed zxid set prematurely while establishing leadership

Bug Closed Blocker Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 22/Jun/10 12:47   23/Nov/11 14:22 29/Jul/10 17:11 3.3.1 3.3.2, 3.4.0 quorum   0 3   The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. 47599 No Perforce job exists for this issue. 14 32868
9 years, 34 weeks, 6 days ago
Reviewed
0|i05zlz:
ZooKeeper ZOOKEEPER-789

Improve FLE log messages

Improvement Closed Major Fixed Flavio Paiva Junqueira Flavio Paiva Junqueira Flavio Paiva Junqueira 18/Jun/10 17:01   23/Nov/11 14:22 05/Jul/10 15:53 3.3.1 3.3.2, 3.4.0     0 0   ZOOKEEPER-798 Notification messages are quite important to determine what is going with leader election. The main idea of this improvement is name the fields we output in notification log messages. 47600 No Perforce job exists for this issue. 4 33391
9 years, 38 weeks, 2 days ago
Reviewed
0|i062u7:
ZooKeeper ZOOKEEPER-788

Add server id to message logs

Improvement Closed Trivial Fixed Ivan Kelly Ivan Kelly Ivan Kelly 17/Jun/10 12:45   23/Nov/11 14:21 25/Jun/10 16:11 3.3.1 3.4.0 contrib   0 0   As discussed on IRC. The log visualisation needs some way of determining which server made which log. If the log segment is taken for a time period where no elections take place, there is no way to determine the id of the server. 47601 No Perforce job exists for this issue. 1 33392
9 years, 39 weeks, 6 days ago
Reviewed
0|i062uf:
ZooKeeper ZOOKEEPER-787

groupId in deployed pom is wrong

Bug Closed Blocker Fixed Unassigned Chris Conrad Chris Conrad 10/Jun/10 12:46   23/Nov/11 14:22 15/Sep/10 11:39 3.3.1 3.3.2, 3.4.0     1 2   The pom deployed to repo1.maven.org has the project declared like this:

<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<packaging>jar</packaging>
<version>3.3.1</version>

But it is deployed here: http://repo2.maven.org/maven2/org/apache/hadoop/zookeeper/3.3.1

So either the groupId needs to change or the location it is deployed to needs to be changed because having them different results in bad behavior. If you specify the correct groupId in your own pom/ivy files you can't even download zookeeper because it's not where your pom says it is and if you use the "incorrect" groupId then you can download zookeeper but then ivy complains about:

[error] :: problems summary ::
[error] :::: ERRORS
[error] public: bad organisation found in http://repo1.maven.org/maven2/org/apache/hadoop/zookeeper/3.3.1/zookeeper-3.3.1.pom: expected='org.apache.hadoop' found='org.apache.zookeeper'
47602 No Perforce job exists for this issue. 0 32869
9 years, 28 weeks, 1 day ago
Reviewed
0|i05zm7:
ZooKeeper ZOOKEEPER-786

Exception in ZooKeeper.toString

Bug Resolved Minor Fixed Thomas Koch Stephen Green Stephen Green 04/Jun/10 17:25   16/Oct/11 21:00 16/Oct/11 21:00 3.3.1 3.5.0 java client   1 2   Mac OS X, x86 When trying to call ZooKeeper.toString during client disconnections, an exception can be generated:


[04/06/10 15:39:57.744] ERROR Error while calling watcher
java.lang.Error: java.net.SocketException: Socket operation on non-socket
at sun.nio.ch.Net.localAddress(Net.java:128)
at sun.nio.ch.SocketChannelImpl.localAddress(SocketChannelImpl.java:430)
at sun.nio.ch.SocketAdaptor.getLocalAddress(SocketAdaptor.java:147)
at java.net.Socket.getLocalSocketAddress(Socket.java:717)
at org.apache.zookeeper.ClientCnxn.getLocalSocketAddress(ClientCnxn.java:227)
at org.apache.zookeeper.ClientCnxn.toString(ClientCnxn.java:183)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at org.apache.zookeeper.ZooKeeper.toString(ZooKeeper.java:1486)
at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2794)
at java.util.Formatter$FormatSpecifier.print(Formatter.java:2677)
at java.util.Formatter.format(Formatter.java:2433)
at java.util.Formatter.format(Formatter.java:2367)
at java.lang.String.format(String.java:2769)
at com.echonest.cluster.ZooContainer.process(ZooContainer.java:544)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
Caused by: java.net.SocketException: Socket operation on non-socket
at sun.nio.ch.Net.localInetAddress(Native Method)
at sun.nio.ch.Net.localAddress(Net.java:125)
... 15 more
19329 No Perforce job exists for this issue. 1 32870
8 years, 26 weeks ago
Reviewed
0|i05zmf:
ZooKeeper ZOOKEEPER-785

Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line

Bug Closed Major Fixed Patrick D. Hunt Alex Newman Alex Newman 02/Jun/10 18:51   23/Nov/11 14:22 14/Sep/10 17:09 3.3.1 3.3.2, 3.4.0 server   0 1   Tested in linux with a new jvm The following config causes an infinite loop

[zoo.cfg]
tickTime=2000
dataDir=/var/zookeeper/
clientPort=2181
initLimit=10
syncLimit=5
server.0=localhost:2888:3888

Output:

2010-06-01 16:20:32,471 - INFO [main:QuorumPeerMain@119] - Starting quorum peer
2010-06-01 16:20:32,489 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181
2010-06-01 16:20:32,504 - INFO [main:QuorumPeer@818] - tickTime set to 2000
2010-06-01 16:20:32,504 - INFO [main:QuorumPeer@829] - minSessionTimeout set to -1
2010-06-01 16:20:32,505 - INFO [main:QuorumPeer@840] - maxSessionTimeout set to -1
2010-06-01 16:20:32,505 - INFO [main:QuorumPeer@855] - initLimit set to 10
2010-06-01 16:20:32,526 - INFO [main:FileSnap@82] - Reading snapshot /var/zookeeper/version-2/snapshot.c
2010-06-01 16:20:32,547 - INFO [Thread-1:QuorumCnxManager$Listener@436] - My election bind port: 3888
2010-06-01 16:20:32,554 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@620] - LOOKING
2010-06-01 16:20:32,556 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@649] - New election. My id = 0, Proposed zxid = 12
2010-06-01 16:20:32,558 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@689] - Notification: 0, 12, 1, 0, LOOKING, LOOKING, 0
2010-06-01 16:20:32,560 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@623] - Unexpected exception
java.lang.NullPointerException
at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@620] - LOOKING
2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@649] - New election. My id = 0, Proposed zxid = 12
2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@689] - Notification: 0, 12, 2, 0, LOOKING, LOOKING, 0
2010-06-01 16:20:32,561 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@623] - Unexpected exception
java.lang.NullPointerException
at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496)
at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621)
2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@620] - LOOKING
2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@649] - New election. My id = 0, Proposed zxid = 12
2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@689] - Notification: 0, 12, 3, 0, LOOKING, LOOKING, 0
2010-06-01 16:20:32,562 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@623] - Unexpected exception
java.lang.NullPointerException


Things like HBase require that the zookeeper servers be listed in the zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in a loop though.
47603 No Perforce job exists for this issue. 8 32871
9 years, 28 weeks, 1 day ago
Reviewed
0|i05zmn:
ZooKeeper ZOOKEEPER-784

ZOOKEEPER-704 server-side functionality for read-only mode

Sub-task Closed Major Fixed Sergey Doroshenko Sergey Doroshenko Sergey Doroshenko 02/Jun/10 18:16   11/Mar/14 04:35 19/May/11 19:45   3.4.0 server   0 4   As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create ReadOnlyZooKeeperServer which comes into play when peer is partitioned. 47604 No Perforce job exists for this issue. 13 33393
8 years, 44 weeks, 5 days ago 0|i062un:
ZooKeeper ZOOKEEPER-783

committedLog in ZKDatabase is not properly synchronized

Bug Closed Critical Fixed Henry Robinson Henry Robinson Henry Robinson 01/Jun/10 15:22   23/Nov/11 14:22 26/Jul/10 18:45 3.3.1 3.3.2, 3.4.0 server   1 1   ZKDatabase.getCommittedLog() returns a reference to the LinkedList<Proposal> committedLog in ZKDatabase. This is then iterated over by at least one caller.

I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()).

It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear.

47605 No Perforce job exists for this issue. 1 32872
9 years, 35 weeks, 2 days ago
Reviewed
0|i05zmv:
ZooKeeper ZOOKEEPER-782

Incorrect C API documentation for Watches

Bug Closed Trivial Fixed Mahadev Konar Dave Wright Dave Wright 31/May/10 16:47   23/Nov/11 14:22 14/Jul/11 13:54 3.3.1 3.4.0 c client, documentation   0 2   The C API Doxygen documentation states:

" .... If the client is ever disconnected from the service, even if the
disconnection is temporary, the watches of the client will be removed from
the service, so a client must treat a disconnect notification as an implicit
trigger of all outstanding watches."

This is incorrect as of v.3. Watches are only lost and need to be re-registered when a session times out. When a normal disconnection occurs watches are reset automatically on reconnection.

The documentation in zookeeper.h needs to be updated to correct this explanation.
47606 No Perforce job exists for this issue. 1 32873
8 years, 36 weeks, 6 days ago Corrected documentation on watch behavior in C API
Reviewed
0|i05zn3:
ZooKeeper ZOOKEEPER-781

provide a generalized "connection strategy" for ZooKeeper clients

New Feature Open Major Unresolved Qian Ye Patrick D. Hunt Patrick D. Hunt 26/May/10 14:00   05/Feb/20 07:16     3.7.0, 3.5.8 c client, java client   1 2   A connection strategy allows control over the way that ZooKeeper clients (we would implement this for both c and java apis) connect to a serving ensemble. Today we have two strategies, randomized round robin (default) and ordered round robin, both of which are hard coded into the client implementation. We would generalize this interface and allow users to create their own.

See this page for more detail: http://wiki.apache.org/hadoop/ZooKeeper/ConnectionStrategy
66777 No Perforce job exists for this issue. 7 42145
8 years, 35 weeks, 1 day ago a draft patch for c client 0|i07kun:
ZooKeeper ZOOKEEPER-780

zkCli.sh generates a ArrayIndexOutOfBoundsException

Bug Resolved Minor Invalid Unassigned Miguel Correia Miguel Correia 25/May/10 06:22   24/Apr/14 19:52 24/Apr/14 19:52 3.3.1 3.5.0 scripts   0 3   Linux Ubuntu running in VMPlayer on top of Windows XP I'm starting to play with Zookeeper so I'm still running it in standalone mode. This is not a big issue, but here it goes for the records.

I've run zkCli.sh to run some commands in the server. I created a znode /groups. When I tried to create a znode client_1 inside /groups, I forgot to include the data: an exception was generated and zkCli-sh crashed, instead of just showing an error. I tried a few variations and it seems like the problem is not including the data.

A copy of the screen:

[zk: localhost:2181(CONNECTED) 3] create /groups firstgroup
Created /groups
[zk: localhost:2181(CONNECTED) 4] create -e /groups/client_1
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3
at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:678)
at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581)
at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353)
at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311)
at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270)
70800 No Perforce job exists for this issue. 3 32874
5 years, 48 weeks ago If no data is provided for the new node when using the "create" zkCli.sh command assume an empty byte array. 0|i05znb:
ZooKeeper ZOOKEEPER-779

C Client should check the connectivity to the hosts in zookeeper_init

Improvement Open Major Unresolved Unassigned Qian Ye Qian Ye 22/May/10 23:00   26/May/10 14:00   3.3.1   c client   0 0   In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. 214170 No Perforce job exists for this issue. 2 42146
9 years, 44 weeks, 1 day ago 0|i07kuv:
ZooKeeper ZOOKEEPER-778

ability to add a watch on a setData or create call

Improvement Open Minor Unresolved Unassigned Woody Anderson Woody Anderson 22/May/10 16:22   15/Nov/19 19:56       c client, java client, server   1 2   It is often desirable to set a watch when creating a node or setting data on a node. Currently, you have to add a watch after the create/set with another api call, which incurs extra cost, and a window of unobserved state change.
This would "seem" to be an easy addition to the server/client libs, but i'm not sure if there are reasons this was never proposed or developed.

I currently am most concerned with a data watch in these two scenarios, but i would imagine other users might be interested in registering a children watch immediately upon creation.

This change would require adding new method signatures in the clients for create and setData which took watchers. And some changes to the protocol, as the SetDataRequest and CreateRequest objects would need watch flags.
feature 62683 No Perforce job exists for this issue. 0 42147
17 weeks, 5 days ago 0|i07kv3:
ZooKeeper ZOOKEEPER-777

setting acl on a non existant node should return no node error

Bug Resolved Major Invalid Unassigned Kapil Thangavelu Kapil Thangavelu 21/May/10 11:52   18/Nov/11 20:07 18/Nov/11 20:07 3.3.0, 3.3.1   server   0 0   currently it just returns successfully, but the acl can't be retrieved, and if any value is being stored, its overwritten when the node is created. 47607 No Perforce job exists for this issue. 1 32875
8 years, 18 weeks, 5 days ago 0|i05znj:
ZooKeeper ZOOKEEPER-776

API should sanity check sessionTimeout argument

Improvement Patch Available Minor Unresolved Raúl Gutiérrez Segalés Gregory Haskins Gregory Haskins 21/May/10 10:12   05/Feb/20 07:12   3.2.2, 3.3.0, 3.3.1, 3.4.6, 3.5.0 3.7.0, 3.5.8 c client, java client   0 3   OSX 10.6.3, JVM 1.6.0-20 passing in a "0" sessionTimeout to ZooKeeper() constructor leads to errors in subsequent operations. It would be ideal to capture this configuration error at the source by throwing something like an IllegalArgument exception when the bogus sessionTimeout is specified, instead of later when it is utilized. 70786 No Perforce job exists for this issue. 4 42148
3 years, 39 weeks, 2 days ago 0|i07kvb:
ZooKeeper ZOOKEEPER-775

A large scale pub/sub system

New Feature Closed Major Fixed Benjamin Reed Benjamin Reed Benjamin Reed 18/May/10 00:21   23/Nov/11 14:22 19/Aug/10 17:29   3.4.0 contrib   0 15   ZOOKEEPER-793 we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. 47608 No Perforce job exists for this issue. 8 33394
9 years, 28 weeks, 1 day ago A pub sub system using BooKkeeper and ZooKeeper with C++ and Java client bindings.
Reviewed
0|i062uv:
ZooKeeper ZOOKEEPER-774

Recipes tests are slightly outdated: they do not compile against JUnit 4.8

Bug Closed Minor Fixed Sergey Doroshenko Sergey Doroshenko Sergey Doroshenko 12/May/10 17:31   23/Nov/11 14:22 14/May/10 19:32 3.3.0 3.4.0 recipes   0 1   As title 47609 No Perforce job exists for this issue. 1 32876
9 years, 43 weeks, 2 days ago
Reviewed
0|i05znr:
ZooKeeper ZOOKEEPER-773

Log visualisation

Improvement Closed Minor Fixed Ivan Kelly Ivan Kelly Ivan Kelly 11/May/10 11:20   23/Nov/11 14:22 09/Jun/10 11:27   3.4.0 contrib 11/Oct/10 0 0   Zkgraph is a log viewer for zookeeper. It can handle transaction logs and message logs. There are currently two view.

a) Server view
The server view shows the interactions between the different servers in an ensemble. The X axis represents time.
* Exceptions show up as red dots. Hovering your mouse over them will give you more details of the exception
* The colour of the line represents the election state of the server.
- orange means LOOKING for leader
- dark green means the server is the leader
- light green means the server is following a leader
- yellow means there isn't enough information to determine the state of the server.
* The gray arrows denote election messages between servers. Pink dashed arrows are messages that were sent but never delivered.

b) Session view
The session view shows the lifetime of sessions on a server. Use the time filter to narrow down the view. Any more than about 2000 events will take a long time to view in your browser.
The Y axis represents time in this case. Each line is a session. The black dots represent events on the session. You can click on the black dots for more details of the event.

2 - Compiling & Running

Run "ant jar" in src/contrib/zkgraph/. This will download all dependencies and compile all the zkgraph code.

Once compilation has finished, you can run it the the zkgraph.sh script in src/contrib/zkgraph/bin. This will start and embedded web server on you machine. Navigate to http://localhost:8182/graph/main.html.
47610 No Perforce job exists for this issue. 3 33395
9 years, 42 weeks ago
Reviewed
0|i062v3:
ZooKeeper ZOOKEEPER-772

zkpython segfaults when watcher from async get children is invoked.

Bug Closed Major Fixed Henry Robinson Kapil Thangavelu Kapil Thangavelu 10/May/10 10:42   23/Nov/11 14:22 11/Aug/10 14:31   3.3.2, 3.4.0 contrib-bindings   0 1   ubuntu lucid (10.04) / zk trunk When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. 47611 No Perforce job exists for this issue. 4 32877
9 years, 33 weeks, 1 day ago
Reviewed
0|i05znz:
ZooKeeper ZOOKEEPER-771

zkpython return without exception set on invalid auth scheme

Bug Open Minor Unresolved Unassigned Kapil Thangavelu Kapil Thangavelu 07/May/10 15:51   07/May/10 16:07       contrib-bindings   0 1   ubuntu lucid If you attempt to utilize an invalid auth scheme when adding authentication, you'll end up with an error return value in your callback. But the handle itself will be hosed, attempting to utilize it with any part of the api will return

SystemError: error return without exception set




214169 No Perforce job exists for this issue. 1 32878
9 years, 46 weeks, 6 days ago 0|i05zo7:
ZooKeeper ZOOKEEPER-770

Slow add_auth calls with multi-threaded client

Bug Patch Available Major Unresolved Craig Calef Kapil Thangavelu Kapil Thangavelu 06/May/10 15:50   05/Feb/20 07:11   3.3.0, 3.3.3, 3.4.0 3.7.0, 3.5.8 c client, contrib-bindings   1 8   ubuntu lucid (10.04), zk trunk (3.4) Calls to add_auth are a bit slow from the c client library. The auth callback typically takes multiple seconds to fire. I instrumented the java, c binding, and python binding with a few log statements to find out where the slowness was occuring ( http://bazaar.launchpad.net/~hazmat/zookeeper/fast-auth-instrumented/revision/647). It looks like when the io thread polls, it doesn't register interest in the incoming packet, so the auth success message from the server and the auth callback are only processed when the poll timeouts. I tried modifying mt_adapter.c so the poll registers interest in both events, this causes a considerably more wakeups but it does address the issue of making add_auth fast. I think the ideal solution would be some sort of additional auth handshake state on the handle, that zookeeper_interest could utilize to suggest both POLLIN|POLLOUT are wanted for subsequent calls to poll during the auth handshake handle state.

i'm attaching a script that takes 13s or 1.6s for the auth callback depending on the session time out value (which in turn figures into the calculation of the poll timeout).
67869 No Perforce job exists for this issue. 5 32879
3 years, 39 weeks, 2 days ago 0|i05zof:
ZooKeeper ZOOKEEPER-769

Leader can treat observers as quorum members

Bug Closed Major Fixed Sergey Doroshenko Sergey Doroshenko Sergey Doroshenko 06/May/10 14:01   23/Nov/11 14:22 21/May/10 12:23 3.3.0 3.4.0     0 4   Ubuntu Karmic x64 In short: it seems leader can treat observers as quorum members.

Steps to repro:

1. Server configuration: 3 voters, 2 observers (attached).
2. Bring up 2 voters and one observer. It's enough for quorum.
3. Shut down the one from the quorum who is the follower.

As I understand, expected result is that leader will start a new election round so that to regain quorum.
But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum).

(Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be).
47612 No Perforce job exists for this issue. 7 32880
9 years, 43 weeks, 2 days ago
Reviewed
0|i05zon:
ZooKeeper ZOOKEEPER-768

zkpython segfault on close (assertion error in io thread)

Bug Open Major Unresolved Unassigned Kapil Thangavelu Kapil Thangavelu 06/May/10 12:05   06/May/10 18:13   3.4.0   contrib-bindings   0 0   ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython) While trying to create a test case showing slow average add_auth, i stumbled upon a test case that reliably segfaults for me, albeit with variable amount of iterations (anwhere from 0 to 20 typically). fwiw, I've got about 220 processes in my test environment (ubuntu lucid 10.04). The test case opens a connection, adds authentication to it, and closes the connection, in a loop. I'm including the sample program and the gdb stack traces from the core file. I can upload the core file if thats helpful. 214168 No Perforce job exists for this issue. 4 32881
9 years, 47 weeks ago 0|i05zov:
ZooKeeper ZOOKEEPER-767

Submitting Demo/Recipe Shared / Exclusive Lock Code

Improvement Resolved Minor Won't Fix Sam Baskinger Sam Baskinger Sam Baskinger 05/May/10 16:32   15/May/13 14:20 15/May/13 14:20 3.3.0 3.5.0 recipes   0 5 28800   Networked Insights would like to share-back some code for shared/exclusive locking that we are using in our labs. 100% 100% 28800 41 No Perforce job exists for this issue. 6 42149
6 years, 45 weeks, 1 day ago New recipe code. 0|i07kvj:
ZooKeeper ZOOKEEPER-766

forrest recipes docs don't mention the lock/queue recipe implementations available in the release

Bug Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 05/May/10 15:04   23/Nov/11 14:22 05/May/10 18:51   3.3.1, 3.4.0 documentation, recipes   0 1   Update the forrest recipes docs to point to the recipe implementations (where available). 47613 No Perforce job exists for this issue. 1 32882
9 years, 47 weeks, 1 day ago
Reviewed
0|i05zp3:
ZooKeeper ZOOKEEPER-765

Add python example script

Improvement Closed Minor Fixed Andrei Savu Travis Crawford Travis Crawford 04/May/10 18:59   21/May/12 03:17 27/Jul/10 11:12   3.4.0 contrib-bindings, documentation   0 3   When adding some zookeeper-based functionality to a python script I had to figure everything out without guidance, which while doable, would have been a lot easier with an example. I extracted a skeleton program structure out with hopes its useful to others (maybe add as an example in the source or wiki?).

This script does an aget() and sets a watch, and hopefully illustrates what's going on, and where to plug in your application code that gets run when the znode changes.

There are probably some bugs, which if we fix now and provide a well-reviewed example hopefully others will not run into the same mistakes.
47614 No Perforce job exists for this issue. 3 33396
9 years, 35 weeks, 1 day ago A skeleton script that shows how to setup znode watches and how to react to events using the Python client libraries.
Reviewed
0|i062vb:
ZooKeeper ZOOKEEPER-764

Observer elected leader due to inconsistent voting view

Bug Closed Major Fixed Henry Robinson Flavio Paiva Junqueira Flavio Paiva Junqueira 04/May/10 17:41   23/Nov/11 14:22 05/May/10 18:27   3.3.1, 3.4.0 quorum   0 1   In ZOOKEEPER-690, we noticed that an observer was being elected, and Henry proposed a patch to fix the issue. However, it seems that the patch does not solve the issue one user (Alan Cabrera) has observed. Given that we would like to fix this issue, and to work separately with Alan to determine the problem with his setup, I'm creating this jira and re-posting Henry's patch. 47615 No Perforce job exists for this issue. 2 32883
9 years, 47 weeks, 1 day ago
Reviewed
0|i05zpb:
ZooKeeper ZOOKEEPER-763

Deadlock on close w/ zkpython / c client

Bug Closed Major Fixed Henry Robinson Kapil Thangavelu Kapil Thangavelu 04/May/10 09:09   23/Nov/11 14:22 05/May/10 18:02 3.3.0 3.3.1, 3.4.0 contrib-bindings   0 1   ubuntu 10.04, zookeeper 3.3.0 and trunk deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join.

afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate.

i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback.

a simple example to reproduce the deadlock is attached.
47616 No Perforce job exists for this issue. 5 32884
9 years, 47 weeks ago
Reviewed
0|i05zpj:
ZooKeeper ZOOKEEPER-762

ZOOKEEPER-107 Allow dynamic addition/removal of server nodes in the client API

Sub-task Resolved Minor Duplicate Unassigned Dave Wright Dave Wright 03/May/10 16:54   29/Dec/12 18:47 29/Dec/12 18:47   3.5.0 c client, java client   1 9   Currently the list of zookeeper servers needs to be provided to the client APIs at construction time, and cannot be changed without a complete shutdown/restart of the client API. However, there are scenarios that require the server list to be updated, such as removal or addition of a ZK cluster node, and it would be nice if the list could be updated via a simple API call.

The general approach (in the Java client) would be to "RemoveServer()/AddServer()" functions for Zookeeper that calls down to ClientCnxn, where they are just maintained in a list. Of course if
the server being removed is the one currently connected, we'd need to disconnect, but a simple call to disconnect() seems like it would resolve that and trigger the automatic re-connection logic.
An equivalent change could be made in the C code.

This change would also make dynamic cluster membership in ZOOKEEPER-107 easier to implement.
214167 No Perforce job exists for this issue. 1 42150
7 years, 12 weeks, 5 days ago 0|i07kvr:
ZooKeeper ZOOKEEPER-761

Remove *synchronous* calls from the *single-threaded* C clieant API, since they are documented not to work

Improvement Resolved Blocker Fixed Benjamin Reed Jozef Hatala Jozef Hatala 29/Apr/10 18:14   30/Jan/19 08:05 25/Mar/18 21:31 3.1.1, 3.2.2 3.5.3, 3.6.0 c client   0 7 0 600   RHEL 4u8 (Linux). The issue is not OS-specific though. Since the synchronous calls are [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client] to be unimplemented in the single threaded version of the client library libzookeeper_st.so, I believe that it would be helpful towards users of the library if that information was also obvious from the header file.

Anecdotally more than one of us here made the mistake of starting by using the synchronous calls with the single-threaded library, and we found ourselves debugging it. An early warning would have been greatly appreciated.

1. Could you please add warnings to the doxygen blocks of all synchronous calls saying that they are not available in the single-threaded API. This cannot be safely done with {{#ifdef THREADED}}, obviously, because the same header file is included whichever client library implementation one is compiling for.

2. Could you please bracket the implementation of all synchronous calls in zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols are not present in libzookeeper_st.so?
100% 100% 600 0 pull-request-available 67868 No Perforce job exists for this issue. 2 42151
1 year, 51 weeks, 3 days ago Removed synchronous calls from the single-threaded API as they are not implemented and documented as such. 0|i07kvz:
ZooKeeper ZOOKEEPER-760

Improved string encoding and decoding performance

Improvement Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 29/Apr/10 14:46   05/Feb/20 07:16     3.7.0, 3.5.8 java client, server   0 1   Our marshaling code converts strings to utf8 bytes, this can be optimized, see:

https://issues.apache.org/jira/browse/AVRO-532
70794 No Perforce job exists for this issue. 0 42152
9 years, 48 weeks ago 0|i07kw7:
ZooKeeper ZOOKEEPER-759

Stop accepting connections when close to file descriptor limit

Improvement Open Major Unresolved Unassigned Travis Crawford Travis Crawford 29/Apr/10 14:02   05/Feb/20 07:16     3.7.0, 3.5.8 server   0 6   Zookeeper always tries to accept new connections, throwing an exception if out of file descriptors. An improvement would be denying new client connections when close to the limit.

Additionally, file-descriptor limits+usage should be exported to the monitoring four-letter word, should that get implemented (see ZOOKEEPER-744).


DETAILS

A Zookeeper ensemble I administer recently suffered an outage when one node was restarted with the low system-default ulimit of 1024 file descriptors and later ran out. File descriptor usage+max are already being monitored by the following MBeans:

- java.lang.OperatingSystem.MaxFileDescriptorCount
- java.lang.OperatingSystem.OpenFileDescriptorCount

They're described (rather tersely) at:

http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html

This feature request is for the following:

(a) Stop accepting new connections when OpenFileDescriptorCount is close to MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be denied, logged to disk at debug level, and increment a ``ConnectionDeniedCount`` MBean counter.

(b) Begin accepting new connections when usage drops below some configurable threshold, defaulting to 90% of FD usage, basically the high/low watermark model.

(c) Update the administrators guide with a comment about using an appropriate FD limit.

(d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for:

zookeeper_open_file_descriptor_count
zookeeper_max_file_descriptor_count
zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not all zk's have the same max FD value
70780 No Perforce job exists for this issue. 0 42153
9 years, 43 weeks, 6 days ago 0|i07kwf:
ZooKeeper ZOOKEEPER-758

zkpython segfaults on invalid acl with missing key

Bug Closed Major Fixed Kapil Thangavelu Kapil Thangavelu Kapil Thangavelu 28/Apr/10 21:28   23/Nov/11 14:22 30/Apr/10 21:14 3.3.0, 3.4.0 3.3.1, 3.4.0 contrib-bindings   0 1   ubuntu lucid (10.04) Currently when setting an acl, there is a minimal parse to ensure that its a list of dicts, however if one of the dicts is missing a required key, the subsequent usage doesn't check for it, and will segfault.. for example using an acl of [{"schema":id, "id":world, permissions:PERM_ALL}] will segfault if used, because the scheme key is missing (its been purposefully typo'd to schema in example).

I've expanded the check_acl macro to include verifying that all keys are present and added some unit tests against trunk in the attachments.
40905 No Perforce job exists for this issue. 3 32885
9 years, 47 weeks, 5 days ago
Reviewed
0|i05zpr:
ZooKeeper ZOOKEEPER-757

zkpython acl/auth usage needs documentation + unit test

Bug Open Major Unresolved Unassigned Kapil Thangavelu Kapil Thangavelu 28/Apr/10 20:13   28/Apr/10 20:36   3.3.0, 3.4.0   contrib-bindings, documentation   0 1   ubuntu karmic / lucid ... sun jdk 1.6.0_20
The zookeeper digest authentication and acl scheme needs a bit more documentation. Currently its documented in the programmer guide.

"""
digest uses a username:password string to generate MD5 hash which is then used as an ACL ID identity. Authentication is done by sending the username:password in clear text. When used in the ACL the expression will be the username:base64 encoded SHA1 password digest.
"""

however its actually the digest of the entire credential that needs to be used.

I've attached a python unit test that sets and verifies an acl on a node.







214166 No Perforce job exists for this issue. 1 32886
9 years, 48 weeks, 1 day ago 0|i05zpz:
ZooKeeper ZOOKEEPER-756

some cleanup and improvements for zooinspector

Improvement Resolved Major Fixed Thomas Koch Thomas Koch Thomas Koch 28/Apr/10 03:03   15/Dec/11 06:58 14/Dec/11 19:06 3.3.0 3.5.0 contrib   0 1   Copied from the already closed ZOOKEEPER-678:

* specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file.


It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path.
Could you use getRessources or something so that I can point to the icons location from the wrapper shell script?

Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location?

There are several places, where viewers is missspelled as "Veiwers". Please do a case insensitive search for "veiw" to correct these. Even the config file "defaultNodeVeiwers.cfg" is missspelled like this. This has the potential to confuse the hell out of people when debugging something!
zooinspector 42 No Perforce job exists for this issue. 9 33397
8 years, 15 weeks ago
Reviewed
zooinspector 0|i062vj:
ZooKeeper ZOOKEEPER-755

Improve c client documentation to reflect that zookeeper_init() creates its own copy of list of host.

Improvement Open Major Unresolved Mahadev Konar Mahadev Konar Mahadev Konar 27/Apr/10 16:09   05/Feb/20 07:16     3.7.0, 3.5.8 c client   0 0   The zookeeper.h file does not mention if zookeeper_init() creates its own copy of host string or not. We need to clarify that in the documentation. 70734 No Perforce job exists for this issue. 0 42154
9 years, 48 weeks, 2 days ago 0|i07kwn:
ZooKeeper ZOOKEEPER-754

numerous misspellings "succesfully"

Task Closed Major Fixed Andrei Savu Thomas Koch Thomas Koch 27/Apr/10 08:57   23/Nov/11 14:22 27/Apr/10 18:47 3.3.0 3.3.1, 3.4.0 contrib-bindings, documentation   0 0   When testing the debian package of zookeeper with the standard tool "lintian", it fills my screen with complains about the misspelling of "succesfully" in several places of the zkpython contrib. Please be so kind to correct this, when you touch the code the next time. Thanks! 47617 No Perforce job exists for this issue. 2 33398
9 years, 48 weeks, 2 days ago fixed numerous misspellings of "succesfully" in the c client and python bindings
Reviewed
0|i062vr:
ZooKeeper ZOOKEEPER-753

update log4j dependency from 1.2.15 to 1.2.16 in branch 3.4

Bug Closed Major Fixed Sean Busbey Karthik K Karthik K 26/Apr/10 02:31   13/Mar/14 14:16 12/Dec/12 02:41 3.4.5 3.4.6     2 4   http://repo2.maven.org/maven2/org/apache/hadoop/zookeeper/3.3.0/zookeeper-3.3.0.pom

The pom contains log4j dependency as itself.

<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.15</version>
<scope>compile</scope>
</dependency>

This is broken without an exclusion list, since the pending dependencies of javax.mail. etc. are not necessary for the most part.

Please fix this along with 3.3.1 and republish new dependencies , since at its current state , it is usable by some projects (to host in central , say).

Correct dependency for log4j:


<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.15</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
</exclusion>
<exclusion>
<groupId>javax.jms</groupId>
<artifactId>jms</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jdmk</groupId>
<artifactId>jmxtools</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.jmx</groupId>
<artifactId>jmxri</artifactId>
</exclusion>
</exclusions>
</dependency>
ivy 65458 No Perforce job exists for this issue. 2 2350
6 years, 2 weeks ago
Reviewed
0|i00r9r:
ZooKeeper ZOOKEEPER-752

address use of "recoverable" vs "revocable" in lock recipes documentation

Bug Resolved Major Duplicate Unassigned Patrick D. Hunt Patrick D. Hunt 22/Apr/10 16:36   04/May/14 08:18 04/May/14 08:18 3.3.0 3.5.0 documentation   0 1   http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html#sc_recoverableSharedLocks
uses the heading "recoverable" locks, but the text refers to "revocable".
70765 No Perforce job exists for this issue. 0 32887
5 years, 46 weeks, 4 days ago 0|i05zq7:
ZooKeeper ZOOKEEPER-751

Recipe heading refers to 'recoverable' but should be 'revocable'

Improvement Resolved Minor Fixed Michi Mutsuzaki Adam Rosien Adam Rosien 22/Apr/10 16:34   05/May/14 06:55 04/May/14 08:39 3.3.0 3.5.0 documentation   0 3   http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html#sc_recoverableSharedLocks uses the heading "recoverable" locks, but the text refers to "revocable". 214165 No Perforce job exists for this issue. 1 42155
5 years, 46 weeks, 3 days ago documentation recipe 0|i07kwv:
ZooKeeper ZOOKEEPER-750

move maven artifacts into "dist-maven" subdir of the release (package target)

Bug Closed Major Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 22/Apr/10 12:45   23/Nov/11 14:22 28/Apr/10 22:26 3.3.0 3.3.1, 3.4.0 build   0 1   The maven artifacts are currently (3.3.0) put into the toplevel of the release. This causes confusion
amonst new users (ie "which jar do I use?"). Also the naming of the bin jar is wrong for maven (to put
onto the maven repo it must be named without the -bin) which adds extra burden for the release
manager. Putting into a subdir fixes this and makes it explicit what's being deployed to maven repo.
47618 No Perforce job exists for this issue. 0 32888
9 years, 48 weeks ago 0|i05zqf:
ZooKeeper ZOOKEEPER-749

OSGi metadata not included in binary only jar

Bug Closed Critical Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 22/Apr/10 12:21   23/Nov/11 14:22 28/Apr/10 22:02 3.3.0 3.3.1, 3.4.0 build   0 2   See this JIRA/comment for background:
https://issues.apache.org/jira/browse/ZOOKEEPER-425?focusedCommentId=12859697&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12859697

basically the issue is that OSGi metadata is included in the legacy jar (zookeeper-<version>.jar) but not in the binary only
jar (zookeeper-<version>-bin.jar) which is eventually deployed to the maven repo.
47619 No Perforce job exists for this issue. 1 32889
9 years, 48 weeks ago
Reviewed
0|i05zqn:
ZooKeeper ZOOKEEPER-748

zkPython's NodeExistsException should include information about the node that exists

Improvement Open Major Unresolved Unassigned Joseph Koshy Joseph Koshy 22/Apr/10 00:58   05/Feb/20 07:16   3.3.0 3.7.0, 3.5.8 contrib-bindings   0 4   Currently the code creates a {{zookeeper.NodeExistsException}} object with a string argument "node exists".

Including the name of the node that caused the exception would be useful, in that it allows user code like the following:
{code:title=example1}
try:
zookeeper.create(zh, n1, ...)
zookeeper.create(zh, n2, ...)
except zookeeper.NodeExistsException, n:
print "Node \"%s\" exists." % n
{code}
70738 No Perforce job exists for this issue. 0 42156
9 years, 49 weeks ago 0|i07kx3:
ZooKeeper ZOOKEEPER-747

Add C# generation to Jute

New Feature Closed Major Fixed Eric Hauser Eric Hauser Eric Hauser 21/Apr/10 23:17   30/Jan/12 12:25 03/May/10 17:50   3.4.0 jute   0 5   The following patch adds a new language, C#, to the Jute code generation. The code that is generated does have a dependency on a third party library, Jon Skeet's MiscUtil, which is Apache licensed. The library is necessary because C# does not provide big endian support in the base class libraries.

As none of the existing Jute code has any unit tests, I have not added tests for this patch.
47620 No Perforce job exists for this issue. 1 33399
8 years, 8 weeks, 3 days ago
Reviewed
0|i062vz:
ZooKeeper ZOOKEEPER-746

learner outputs session id to log in dec (should be hex)

Bug Closed Minor Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 21/Apr/10 18:10   23/Nov/11 14:22 26/Apr/10 00:18   3.3.1, 3.4.0 quorum, server   0 1   usability issue, should be in hex:

2010-04-21 11:31:13,827 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11354:Learner@95] - Revalidating client: 83353578391797760
47621 No Perforce job exists for this issue. 1 32890
9 years, 48 weeks, 3 days ago
Reviewed
0|i05zqv:
ZooKeeper ZOOKEEPER-745

zkpython documentation

Task Open Major Unresolved Unassigned Henry Robinson Henry Robinson 21/Apr/10 13:35   21/Apr/10 13:35           0 0   zkpython deserves better documentation than the README I have given it. This jira is for tracking a document that includes at a minimum:

1. Installation instructions
2. Basic usage instructions, including common idiomatic use
3. API reference

214164 No Perforce job exists for this issue. 0 42157
9 years, 49 weeks, 1 day ago 0|i07kxb:
ZooKeeper ZOOKEEPER-744

Add monitoring four-letter word

New Feature Closed Major Fixed Andrei Savu Travis Crawford Travis Crawford 19/Apr/10 16:14   23/Nov/11 14:22 05/Jul/10 14:22 3.4.0 3.4.0 server   0 3   Filing a feature request based on a zookeeper-user discussion.

Zookeeper should have a new four-letter word that returns key-value pairs appropriate for importing to a monitoring system (such as Ganglia which has a large installed base)

This command should initially export the following:

(a) Count of instances in the ensemble.
(b) Count of up-to-date instances in the ensemble.

But be designed such that in the future additional data can be added. For example, the output could define the statistic in a comment, then print a key "space character" value line:

"""
# Total number of instances in the ensemble
zk_ensemble_instances_total 5
# Number of instances currently participating in the quorum.
zk_ensemble_instances_active 4
"""

From the mailing list:

"""
Date: Mon, 19 Apr 2010 12:10:44 -0700
From: Patrick Hunt <phunt@apache.org>
To: zookeeper-user@hadoop.apache.org
Subject: Re: Recovery issue - how to debug?

On 04/19/2010 11:55 AM, Travis Crawford wrote:
> It would be a lot easier from the operations perspective if the leader
> explicitly published some health stats:
>
> (a) Count of instances in the ensemble.
> (b) Count of up-to-date instances in the ensemble.
>
> This would greatly simplify monitoring& alerting - when an instance
> falls behind one could configure their monitoring system to let
> someone know and take a look at the logs.

That's a great idea. Please enter a JIRA for this - a new 4 letter word
and JMX support. It would also be a great starter project for someone
interested in becoming more familiar with the server code.

Patrick
"""
47622 No Perforce job exists for this issue. 5 33400
9 years, 38 weeks, 2 days ago Added new 4letter word for monitoring: "mntr" The output is compatible with the Java properties format.Your script should expect content changes: new keys could be added in the future.
Reviewed
zookeeper monitoring 0|i062w7:
ZooKeeper ZOOKEEPER-743

Diagram error on Zookeeper internals page

Bug Open Trivial Unresolved Unassigned Ivan Kelly Ivan Kelly 16/Apr/10 10:20   16/Apr/10 10:20           0 0   http://hadoop.apache.org/zookeeper/docs/r3.1.2/zookeeperInternals.html

In the active messaging diagram, one of the commit arrows is going the wrong way.
214163 No Perforce job exists for this issue. 0 32891
9 years, 49 weeks, 6 days ago 0|i05zr3:
ZooKeeper ZOOKEEPER-742

Deallocatng None on writes

Bug Closed Major Fixed Henry Robinson Josh Fraser Josh Fraser 15/Apr/10 19:21   23/Nov/11 14:22 22/Apr/10 02:44 3.2.2, 3.3.0 3.3.1, 3.4.0 c client, contrib, contrib-bindings   0 1   Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 (python 2.5.1) On write operations, getting:

Fatal Python error: deallocating None
Aborted

This error happens on write operations only. Here's the backtrace:

Fatal Python error: deallocating None

Program received signal SIGABRT, Aborted.
0x000000383fc30215 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x000000383fc30215 in raise () from /lib64/libc.so.6
#1 0x000000383fc31cc0 in abort () from /lib64/libc.so.6
#2 0x00002adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0
#3 0x00002adbd0bc7493 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
#4 0x00002adbd0bcab66 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0
#5 0x00002adbd0bcbfe5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0
#6 0x00002adbd0bcc032 in PyEval_EvalCode () from /usr/lib64/libpython2.4.so.1.0
#7 0x00002adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0
#8 0x00002adbd0be9bd8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.4.so.1.0
#9 0x00002adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0
#10 0x000000383fc1d974 in __libc_start_main () from /lib64/libc.so.6
#11 0x0000000000400629 in _start ()
47623 No Perforce job exists for this issue. 4 32892
9 years, 49 weeks ago 0|i05zrb:
ZooKeeper ZOOKEEPER-741

root level create on REST proxy fails

Bug Closed Critical Fixed Patrick D. Hunt Patrick D. Hunt Patrick D. Hunt 14/Apr/10 13:12   23/Nov/11 14:22 22/Apr/10 01:43 3.3.0 3.3.1, 3.4.0 contrib   0 1   Create /foo using the REST proxy fails.

Also upgrade to the latest Jersey/Grizzly while we are at it (fixes for func/security)
47624 No Perforce job exists for this issue. 1 32893
9 years, 49 weeks ago
Reviewed
0|i05zrj:
ZooKeeper ZOOKEEPER-740

zkpython leading to segfault on zookeeper

Bug Resolved Major Fixed Henry Robinson Federico Federico 13/Apr/10 04:08   24/Apr/14 21:45 24/Apr/14 21:45 3.3.0       0 5   The program that we are implementing uses the python binding for zookeeper but sometimes it crash with segfault; here is the bt from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xad244b70 (LWP 28216)]
0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
at ../Objects/abstract.c:2488
2488 ../Objects/abstract.c: No such file or directory.
in ../Objects/abstract.c
(gdb) bt
#0 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0)
at ../Objects/abstract.c:2488
#1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0,
arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575
#2 0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194)
at ../Objects/abstract.c:2480
#3 0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1,
path=0x86337c8 "", context=0x8588660) at src/c/zookeeper.c:314
#4 0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1,
path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:275
#5 deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "",
list=0xa5354140) at src/zk_hashtable.c:317
#6 0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766
#7 0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333
#8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6
59683 No Perforce job exists for this issue. 1 32894
5 years, 47 weeks, 6 days ago 0|i05zrr:
ZooKeeper ZOOKEEPER-739

use a simple command-line parsing library for flexibility in command-line arguments

Improvement Open Major Unresolved Unassigned Karthik K Karthik K 12/Apr/10 19:10   12/Apr/10 19:10           0 2   JOpt is being used by HBase team and very light-weight.

http://jopt-simple.sourceforge.net/examples.html

<jopt.version>3.2</jopt.version>


mvn artifacts are available in public repositories, so integrating with ivy should not be an issue either.

Check if that makes sense.
214162 No Perforce job exists for this issue. 0 42158
9 years, 50 weeks, 3 days ago 0|i07kxj:
ZooKeeper ZOOKEEPER-738

zookeeper.jute.h fails to compile with -pedantic

Bug Closed Major Fixed Jozef Hatala Patrick D. Hunt Patrick D. Hunt 10/Apr/10 17:17   23/Nov/11 14:22 26/Apr/10 14:58 3.3.0 3.3.1, 3.4.0 c client   0 1   /home/y/include/zookeeper/zookeeper.jute.h:96: error: extra semicolon
/home/y/include/zookeeper/zookeeper.jute.h:158: error: extra semicolon
/home/y/include/zookeeper/zookeeper.jute.h:288: error: extra semicolon

the code generator needs to be updated to not output a naked semi
47625 No Perforce job exists for this issue. 1 32895
9 years, 48 weeks, 3 days ago
Reviewed
0|i05zrz:
ZooKeeper ZOOKEEPER-737

some 4 letter words may fail with netcat (nc)

Bug Closed Blocker Fixed Mahadev Konar Patrick D. Hunt Patrick D. Hunt 10/Apr/10 17:13   29/Dec/11 18:08 04/May/10 17:51 3.3.0 3.3.1, 3.4.0 server   0 3   nc closes the write channel as soon as it's sent it's information, for example "echo stat|nc localhost 2181"
in general this is fine, however the server code will close the socket as soon as it receives notice that nc has
closed it's write channel. if not all the 4 letter word result has been written back to the client yet, this will cause
some or all of the result to be lost - ie the client will not see the full result. this was introduced in 3.3.0 as part
of a change to reduce blocking of the selector by long running 4letter words.

here's an example of the logs from the server during this

echo -n stat | nc localhost 2181
2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:42179
2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@968] - Processing stat command from /127.0.0.1:42179
2010-04-09 21:55:36,125 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
2010-04-09 21:55:36,125 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1286] - Closed socket connection for client /127.0.0.1:42179 (no session established for client)
[phunt@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn@422] - Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
at java.io.BufferedWriter.flush(BufferedWriter.java:236)
at java.io.PrintWriter.flush(PrintWriter.java:276)
at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)
2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn$Factory$1@82] - Thread Thread[Thread-15,5,main] died
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909)
at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945)
at java.io.BufferedWriter.flush(BufferedWriter.java:236)
at java.io.PrintWriter.flush(PrintWriter.java:276)
at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089)
47626 No Perforce job exists for this issue. 7 32896
8 years, 24 weeks, 1 day ago
Reviewed
0|i05zs7:
ZooKeeper ZOOKEEPER-736

docs for server config options should specify which are required and which have defaults

Bug Open Major Unresolved Unassigned Patrick D. Hunt Patrick D. Hunt 08/Apr/10 19:17   05/Feb/20 07:16   3.3.0 3.7.0, 3.5.8 documentation   0 0   the docs (admin) should do a better job specifying which config parameters are required and the defaults if any. initLimit/syncLimit are
both examples where we don't do this
66540 No Perforce job exists for this issue. 0 32897
9 years, 51 weeks ago 0|i05zsf:
ZooKeeper ZOOKEEPER-735

cppunit test testipv6 assumes that the machine is ipv6 enabled.

Bug Closed Major Fixed Mahadev Konar Mahadev Konar Mahadev Konar 07/Apr/10 17:26   23/Nov/11 14:22 08/Apr/10 14:32   3.3.1, 3.4.0 tests   0 1   The test should be fixed so that it runs only if ipv6 is enabled and does not run if ipv6 is not enabled. 47627 No Perforce job exists for this issue. 1 32898
9 years, 51 weeks ago
Reviewed
0|i05zsn:
ZooKeeper ZOOKEEPER-734

QuorumPeerTestBase.java and ZooKeeperServerMainTest.java do not handle windows path correctly

Bug Closed Major Fixed Vishal Kher Vishal Kher Vishal Kher 06/Apr/10 18:43   23/Nov/11 14:22 26/Apr/10 16:02 3.3.0 3.3.1, 3.4.0 tests   0 1   Windows 32-bit While runniing "ant test-core-java" QuorumPeerTestBase.java and ZooKeeperServerMainTest.java fail. The problem seems to be in ZookeeperserverMainTest.java:MainThread():66 and in QuorumPeerBaseTest.java:MainThread:76.

FileWriter.write() writes windows path to the conf file. Java does not like windows path. Therefore, the test complains that it cannot find myid and fails.

Solution - convert windows path to UNIX path. This worked for me on windows. Diffs are attached below. Solution not tested on Linux since for some reason build is failing (due to problems not related to this change).


vmc-floorb-dhcp116-114:/opt/zksrc/zookeeper-3.3.0/src/java/test/org/apache/zookeeper/server # svn diff
Index: ZooKeeperServerMainTest.java
===================================================================
--- ZooKeeperServerMainTest.java (revision 931240)
+++ ZooKeeperServerMainTest.java (working copy)
@@ -61,7 +61,8 @@
if (!dataDir.mkdir()) {
throw new IOException("unable to mkdir " + dataDir);
}
- fwriter.write("dataDir=" + dataDir.toString() + "\n");
+ String data = dataDir.toString().replace('\\', '/');
+ fwriter.write("dataDir=" + data + "\n");

fwriter.write("clientPort=" + clientPort + "\n");
fwriter.flush();
Index: quorum/QuorumPeerTestBase.java
===================================================================
--- quorum/QuorumPeerTestBase.java (revision 931240)
+++ quorum/QuorumPeerTestBase.java (working copy)
@@ -73,7 +73,8 @@
if (!dataDir.mkdir()) {
throw new IOException("Unable to mkdir " + dataDir);
}
- fwriter.write("dataDir=" + dataDir.toString() + "\n");
+ String data = dataDir.toString().replace('\\', '/');
+ fwriter.write("dataDir=" + data + "\n");

fwriter.write("clientPort=" + clientPort + "\n");
fwriter.write(quorumCfgSection + "\n");
47628 No Perforce job exists for this issue. 1 32899
9 years, 48 weeks, 3 days ago
Reviewed
0|i05zsv:
ZooKeeper ZOOKEEPER-733

use netty to handle client connections

Improvement Closed Major Fixed Patrick D. Hunt Benjamin Reed Benjamin Reed 05/Apr/10 10:44   24/Mar/17 13:17 18/Aug/10 02:25   3.4.0 server   0 4   we currently have our own asynchronous NIO socket engine to be able to handle lots of clients with a single thread. over time the engine has become more complicated. we would also like the engine to use multiple threads on machines with lots of cores. plus, we would like to be able to support things like SSL. if we switch to netty, we can simplify our code and get the previously mentioned benefits. 47629 No Perforce job exists for this issue. 13 33401
9 years, 28 weeks, 1 day ago
Reviewed
0|i062wf:
ZooKeeper ZOOKEEPER-732

Improper translation of error into Python exception

Bug Closed Minor Fixed Lei Zhang Gustavo Niemeyer Gustavo Niemeyer 29/Mar/10 18:48   13/Mar/14 14:17 03/Oct/13 17:51 3.3.0 3.4.6, 3.5.0 contrib-bindings   0 6   Apparently errors returned by the C library are not being correctly converted into a Python exception in some cases:

>>> zookeeper.get_children(0, "/", None)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
SystemError: error return without exception set
47630 No Perforce job exists for this issue. 4 32900
6 years, 2 weeks ago Client that uses python binding may receive SystemError on session expiration.
Reviewed
0|i05zt3:
ZooKeeper ZOOKEEPER-731

Zookeeper#delete , #create - async versions miss a verb in the javadoc

Bug Closed Minor Fixed Thomas Koch Karthik K Karthik K 26/Mar/10 19:12   23/Nov/11 14:22 06/Sep/11 10:12 3.3.0 3.4.0 documentation   0 2   /**
* The Asynchronous version of delete. "The request doesn't *missing* actually until
* the asynchronous callback is called."
*/
public void delete(final String path, int version, VoidCallback cb, Object ctx) ..


Also some information in the javadoc about how to instantiate the callback objects / context would be useful .
47631 No Perforce job exists for this issue. 3 32901
8 years, 29 weeks, 2 days ago 0|i05ztb:
ZooKeeper ZOOKEEPER-730

C cli: Add a command to recursively delete a znode

New Feature Open Major Unresolved Unassigned Karthik K Karthik K 26/Mar/10 18:38   14/Dec/19 06:08     3.7.0 c client   0 4   ZOOKEEPER-729 talks about recursively deleting a znode in java. Once the review is complete and frozen, equivalent functionality need to be available in C client as well.

Tracker jira for the same.
214161 No Perforce job exists for this issue. 0 42159
5 years, 47 weeks, 6 days ago 0|i07kxr:
ZooKeeper ZOOKEEPER-729

Recursively delete a znode - zkCli.sh rmr /node

New Feature Closed Major Fixed Karthik K Karthik K Karthik K 26/Mar/10 15:30   28/Dec/11 11:13 15/Apr/10 03:26   3.4.0 java client   0 2   Recursively delete a given znode in zookeeper, from the command-line.

New operation "rmr" added to zkclient.

$ ./zkCli.sh rmr /node
47632 No Perforce job exists for this issue. 5 33402
8 years, 13 weeks, 1 day ago
Reviewed
0|i062wn:
Generated at Fri Mar 20 00:35:58 UTC 2020 by Song Xu using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b.